Video coding with content adaptive spatially varying quantization

ABSTRACT

A video encoder may be configured to apply a multi-stage quantization process, where residuals are first quantized using an effective quantization parameter derived from the statistics of the samples of the block. The residual is then further quantized using a base quantization parameter that is uniform across a picture. A video decoder may be configured to decode the video data using the base quantization parameter. The video decoder may further be configured to estimate the effective quantization parameter from the statistics of the decoded samples of the block. The video decoder may then use the estimated effective quantization parameter for use in determining parameters for other coding tools, including filters.

This application is a Continuation of application Ser. No. 16/155,344,filed Oct. 9, 2018, which claims the benefit of U.S. ProvisionalApplication No. 62/571,732, filed Oct. 12, 2017, the entire content eachof which is incorporated by reference herein.

TECHNICAL FIELD

This disclosure relates to video coding and/or video processing.

BACKGROUND

Digital video capabilities can be incorporated into a wide range ofdevices, including digital televisions, digital direct broadcastsystems, wireless broadcast systems, personal digital assistants (PDAs),laptop or desktop computers, tablet computers, e-book readers, digitalcameras, digital recording devices, digital media players, video gamingdevices, video game consoles, cellular or satellite radio telephones,so-called “smart phones,” video teleconferencing devices, videostreaming devices, and the like. Digital video devices implement videocoding techniques, such as those described in the standards defined byMPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced VideoCoding (AVC), ITU-T H.265, High Efficiency Video Coding (HEVC), andextensions of such standards. The video devices may transmit, receive,encode, decode, and/or store digital video information more efficientlyby implementing such video coding techniques.

Video coding techniques include spatial (intra-picture) predictionand/or temporal (inter-picture) prediction to reduce or removeredundancy inherent in video sequences. For block-based video coding, avideo slice (e.g., a video frame or a portion of a video frame) may bepartitioned into video blocks, which may also be referred to astreeblocks, coding units (CUs) and/or coding nodes. Video blocks in anintra-coded (I) slice of a picture are encoded using spatial predictionwith respect to reference samples in neighboring blocks in the samepicture. Video blocks in an inter-coded (P or B) slice of a picture mayuse spatial prediction with respect to reference samples in neighboringblocks in the same picture or temporal prediction with respect toreference samples in other reference pictures. Pictures may be referredto as frames, and reference pictures may be referred to as referenceframes.

Spatial or temporal prediction results in a predictive block for a blockto be coded. Residual data represents pixel differences between theoriginal block to be coded and the predictive block. An inter-codedblock is encoded according to a motion vector that points to a block ofreference samples forming the predictive block, and the residual dataindicating the difference between the coded block and the predictiveblock. An intra-coded block is encoded according to an intra-coding modeand the residual data. For further compression, the residual data may betransformed from the pixel domain to a transform domain, resulting inresidual transform coefficients, which then may be quantized. Thequantized transform coefficients, initially arranged in atwo-dimensional array, may be scanned in order to produce aone-dimensional vector of transform coefficients, and entropy coding maybe applied to achieve even more compression.

The total number of color values that may be captured, coded, anddisplayed may be defined by a color gamut. A color gamut refers to therange of colors that a device can capture (e.g., a camera) or reproduce(e.g., a display). Often, color gamuts differ from device to device. Forvideo coding, a predefined color gamut for video data may be used suchthat each device in the video coding process may be configured toprocess pixel values in the same color gamut. Some color gamuts aredefined with a larger range of colors than color gamuts that have beentraditionally used for video coding. Such color gamuts with a largerrange of colors may be referred to as a wide color gamut (WCG).

Another aspect of video data is dynamic range. Dynamic range istypically defined as the ratio between the minimum and maximumbrightness (e.g., luminance) of a video signal. The dynamic range ofcommon video data used in the past is considered to have a standarddynamic range (SDR). Other example specifications for video data definecolor data that has a larger ratio between the minimum and maximumbrightness. Such video data may be described as having a high dynamicrange (HDR).

SUMMARY

This disclosure describes example processing methods (and devicesconfigured to perform the methods) applied in the coding (e.g., encodingor decoding) loop of a video coding system. The techniques of thisdisclosure are applicable for coding of video data representations withnon-uniformly distributed perceived just-noticeable-difference (e.g.,signal-to-noise ratio) of the video data over its dynamic range. A videoencoder may be configured to apply a multi-stage quantization process,where residuals are first quantized using an effective quantizationparameter derived from the statistics of the samples of the block. Theresidual is then further quantized using a base quantization parameterthat is uniform across a picture. A video decoder may be configured todecode the video data using the base quantization parameter. The videodecoder may further be configured to estimate the effective quantizationparameter from the statistics of the decoded samples of the block. Thevideo decoder may then use the estimated effective quantizationparameter for use in determining parameters for other coding tools,including filters. In this way, signaling overhead is saved as theeffective quantization parameter is not signaled, but is estimated atthe decoder side.

In one example, this disclosure describes a method of decoding videodata, the method comprising receiving an encoded block of the videodata, the encoded block of the video data having been encoded using aneffective quantization parameter and a base quantization parameter,wherein the effective quantization parameter is a function of aquantization parameter offset added to the base quantization parameter,determining the base quantization parameter used to encode the encodedblock of the video data, decoding the encoded block of the video datausing the base quantization parameter to create a decoded block of videodata, determining an estimate of the quantization parameter offset forthe decoded block of the video data based on statistics associated withthe decoded block of the video data, adding the estimate of thequantization parameter offset to the base quantization parameter tocreate an estimate of the effective quantization parameter, andperforming one or more filtering operations on the decoded block ofvideo data as a function of the estimate of the effective quantizationparameter.

In another example, this disclosure describes a method of encoding videodata, the method comprising determining a base quantization parameterfor a block of the video data, determining a quantization parameteroffset for the block of the video data based on statistics associatedwith the block of the video data, adding the quantization parameteroffset to the base quantization parameter to create an effectivequantization parameter, and encoding the block of the video data usingthe effective quantization parameter and the base quantizationparameter.

In another example, this disclosure describes an apparatus configured todecode video data, the apparatus comprising a memory configured to storean encoded block of the video data, and one or more processors incommunication with the memory, the one or more processors configured toreceive the encoded block of the video data, the encoded block of thevideo data having been encoded using an effective quantization parameterand a base quantization parameter, wherein the effective quantizationparameter is a function of a quantization parameter offset added to thebase quantization parameter, determine the base quantization parameterused to encode the encoded block of the video data, decode the encodedblock of the video data using the base quantization parameter to createa decoded block of video data, determine an estimate of the quantizationparameter offset for the decoded block of the video data based onstatistics associated with the decoded block of the video data, add theestimate of the quantization parameter offset to the base quantizationparameter to create an estimate of the effective quantization parameter,and perform one or more filtering operations on the decoded block ofvideo data as a function of the estimate of the effective quantizationparameter.

In another example, this disclosure describes an apparatus configured toencode video data, the apparatus comprising a memory configured to storea block of the video data, and one or more processors in communicationwith the memory, the one or more processors configured to determine abase quantization parameter for the block of the video data, determine aquantization parameter offset for the block of the video data based onstatistics associated with the block of the video data, add thequantization parameter offset to the base quantization parameter tocreate an effective quantization parameter, and encode the block of thevideo data using the effective quantization parameter and the basequantization parameter.

In another example, this disclosure describes an apparatus configured todecode video data, the apparatus comprising means for receiving anencoded block of the video data, the encoded block of the video datahaving been encoded using an effective quantization parameter and a basequantization parameter, wherein the effective quantization parameter isa function of a quantization parameter offset added to the basequantization parameter, means for determining the base quantizationparameter used to encode the encoded block of the video data, means fordecoding the encoded block of the video data using the base quantizationparameter to create a decoded block of video data, means for determiningan estimate of the quantization parameter offset for the decoded blockof the video data based on statistics associated with the decoded blockof the video data, means for adding the estimate of the quantizationparameter offset to the base quantization parameter to create anestimate of the effective quantization parameter, and means forperforming one or more filtering operations on the decoded block ofvideo data as a function of the estimate of the effective quantizationparameter.

In another example, this disclosure describes an apparatus configured toencode video data, the apparatus comprising means for determining a basequantization parameter for a block of the video data, means fordetermining a quantization parameter offset for the block of the videodata based on statistics associated with the block of the video data,means for adding the quantization parameter offset to the basequantization parameter to create an effective quantization parameter,and means for encoding the block of the video data using the effectivequantization parameter and the base quantization parameter.

In another example, this disclosure describes a non-transitorycomputer-readable storage medium storing instructions that, whenexecuted, cause one or more processors to receive the encoded block ofthe video data, the encoded block of the video data having been encodedusing an effective quantization parameter and a base quantizationparameter, wherein the effective quantization parameter is a function ofa quantization parameter offset added to the base quantizationparameter, determine the base quantization parameter used to encode theencoded block of the video data, decode the encoded block of the videodata using the base quantization parameter to create a decoded block ofvideo data, determine an estimate of the quantization parameter offsetfor the decoded block of the video data based on statistics associatedwith the decoded block of the video data, add the estimate of thequantization parameter offset to the base quantization parameter tocreate an estimate of the effective quantization parameter, and performone or more filtering operations on the decoded block of video data as afunction of the estimate of the effective quantization parameter.

In another example, this disclosure describes a non-transitorycomputer-readable storage medium storing instructions that, whenexecuted, cause one or more processors to determine a base quantizationparameter for the block of the video data, determine a quantizationparameter offset for the block of the video data based on statisticsassociated with the block of the video data, add the quantizationparameter offset to the base quantization parameter to create aneffective quantization parameter, and encode the block of the video datausing the effective quantization parameter and the base quantizationparameter.

The details of one or more examples are set forth in the accompanyingdrawings and the description below. Other features, objects, andadvantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system configured to implement the techniques of thedisclosure.

FIGS. 2A and 2B are conceptual diagrams illustrating an example quadtreebinary tree (QTBT) structure, and a corresponding coding tree unit(CTU).

FIG. 3 is a conceptual drawing illustrating the concepts of HDR data.

FIG. 4 is a conceptual diagram illustrating example color gamuts.

FIG. 5 is a flow diagram illustrating an example of HDR/WCGrepresentation conversion.

FIG. 6 is a flow diagram illustrating an example of HDR/WCG inverseconversion.

FIG. 7 is conceptual diagram illustrating example of Electro-opticaltransfer functions (EOTF) utilized for video data conversion (includingSDR and HDR) from perceptually uniform code levels to linear luminance.

FIG. 8 is a block diagram illustrating an example of a video encoderthat may implement techniques of this disclosure.

FIG. 9 is a block diagram illustrating an example quantization unit of avideo encoder that may implement techniques of this disclosure

FIG. 10 is a block diagram illustrating an example of a video decoderthat may implement techniques of this disclosure.

FIG. 11 is a flowchart illustrating an example encoding method.

FIG. 12 is a flowchart illustrating an example decoding method.

DETAILED DESCRIPTION

This disclosure is related to the processing and/or coding of video datawith high dynamic range (HDR) and wide color gamut (WCG)representations. More specifically, the techniques of this disclosureinclude content-adaptive spatially varying quantization without explicitsignaling of quantization parameters (e.g., a change in a quantizationparameter represented by a deltaQP syntax element) to efficientlycompress HDR/WCG video signals. The techniques and devices describedherein may improve compression efficiency of video coding systemsutilized for coding HDR and WCG video data. The techniques of thisdisclosure may be used in the context of advanced video codecs, such asextensions of HEVC or next generation video coding standards.

Video coding standards, including hybrid-based video coding standardsinclude ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IECMPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (alsoknown as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC)and Multi-view Video Coding (MVC) extensions. The design of a new videocoding standard, namely High Efficiency Video coding (HEVC, also calledH.265), has been finalized by the Joint Collaboration Team on VideoCoding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO/IECMotion Picture Experts Group (MPEG). An HEVC draft specificationreferred to as HEVC Working Draft 10 (WD10), Bross et al., “Highefficiency video coding (HEVC) text specification draft 10 (for FDIS &Last Call),” Joint Collaborative Team on Video Coding (JCT-VC) of ITU-TSG16 WP3 and ISO/IEC JTC1/SC29/WG11,12th Meeting: Geneva, CH, 14-23 Jan.2013, JCTVC-L1003v34, is available fromhttp://phenix.int-evry.fr/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34.zip.The finalized HEVC standard is referred to as HEVC version 1. Thefinalized HEVC standard document is published as ITU-T H.265, Series H:Audiovisual and Multimedia Systems, Infrastructure of audiovisualservices—Coding of moving video, High efficiency video coding,Telecommunication Standardization Sector of InternationalTelecommunication Union (ITU), April 2013, and another version of thefinalized HEVC standard was published in October 2014. A copy of theH.265/HEVC specification text may be downloaded fromhttp://www.itu.int/rec/T-REC-H.265-201504-I/en.

ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) are now studyingthe potential need for standardization of future video coding technologywith a compression capability that exceeds that of the current HEVCstandard (including its current extensions and near-term extensions forscreen content coding and high-dynamic-range coding). The groups areworking together on this exploration activity in a joint collaborationeffort known as the Joint Video Exploration Team (JVET) to evaluatecompression technology designs proposed by their experts in this area.The JVET first met during 19-21 Oct. 2015. And the latest version ofreference software, i.e., Joint Exploration Model 7 (JEM7) could bedownloaded from:https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/tags/HM-16.6-JEM-7.0/.This algorithm description for JEM7 could be referred to as J. Chen, E.Alshina, G. J. Sullivan, J.-R. Ohm, J. Boyce “Algorithm description ofJoint Exploration Test Model 7 (JEM7),” JVET-G1001, Torino, July 2017.

Recently, a new video coding standard, referred to as the VersatileVideo Coding (VVC) standard, is under development by the Joint VideoExpert Team (JVET) of VCEG and MPEG. An early draft of the VVC isavailable in the document JVET-J1001 “Versatile Video Coding (Draft 1)”and its algorithm description is available in the document JVET-J1002“Algorithm description for Versatile Video Coding and Test Model 1 (VTM1).”

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system 10 that may utilize techniques of this disclosure. Asshown in FIG. 1, system 10 includes a source device 12 that providesencoded video data to be decoded at a later time by a destination device14. In particular, source device 12 provides the video data todestination device 14 via a computer-readable medium 16. Source device12 and destination device 14 may comprise any of a wide range ofdevices, including desktop computers, notebook (i.e., laptop) computers,tablet computers, set-top boxes, telephone handsets such as so-called“smart” phones, so-called “smart” pads, televisions, cameras, displaydevices, digital media players, video gaming consoles, video streamingdevices, or the like. In some cases, source device 12 and destinationdevice 14 may be equipped for wireless communication.

Destination device 14 may receive the encoded video data to be decodedvia computer-readable medium 16. Computer-readable medium 16 maycomprise any type of medium or device capable of moving the encodedvideo data from source device 12 to destination device 14. In oneexample, computer-readable medium 16 may comprise a communication mediumto enable source device 12 to transmit encoded video data directly todestination device 14 in real-time. The encoded video data may bemodulated according to a communication standard, such as a wired orwireless communication protocol, and transmitted to destination device14. The communication medium may comprise any wireless or wiredcommunication medium, such as a radio frequency (RF) spectrum or one ormore physical transmission lines. The communication medium may form partof a packet-based network, such as a local area network, a wide-areanetwork, or a global network such as the Internet. The communicationmedium may include routers, switches, base stations, or any otherequipment that may be useful to facilitate communication from sourcedevice 12 to destination device 14.

In other examples, computer-readable medium 16 may includenon-transitory storage media, such as a hard disk, flash drive, compactdisc, digital video disc, Blu-ray disc, or other computer-readablemedia. In some examples, a network server (not shown) may receiveencoded video data from source device 12 and provide the encoded videodata to destination device 14, e.g., via network transmission.Similarly, a computing device of a medium production facility, such as adisc stamping facility, may receive encoded video data from sourcedevice 12 and produce a disc containing the encoded video data.Therefore, computer-readable medium 16 may be understood to include oneor more computer-readable media of various forms, in various examples.

In some examples, encoded data may be output from output interface 22 toa storage device. Similarly, encoded data may be accessed from thestorage device by input interface. The storage device may include any ofa variety of distributed or locally accessed data storage media such asa hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile ornon-volatile memory, or any other suitable digital storage media forstoring encoded video data. In a further example, the storage device maycorrespond to a file server or another intermediate storage device thatmay store the encoded video generated by source device 12. Destinationdevice 14 may access stored video data from the storage device viastreaming or download. The file server may be any type of server capableof storing encoded video data and transmitting encoded video data to thedestination device 14. Example file servers include a web server (e.g.,for a website), an FTP server, network attached storage (NAS) devices,or a local disk drive. Destination device 14 may access the encodedvideo data through any standard data connection, including an Internetconnection. This may include a wireless channel (e.g., a Wi-Ficonnection), a wired connection (e.g., DSL, cable modem, etc.), or acombination of both that is suitable for accessing encoded video datastored on a file server. The transmission of encoded video data from thestorage device may be a streaming transmission, a download transmission,or a combination thereof.

The techniques of this disclosure are not necessarily limited towireless applications or settings. The techniques may be applied tovideo coding in support of any of a variety of multimedia applications,such as over-the-air television broadcasts, cable televisiontransmissions, satellite television transmissions, Internet streamingvideo transmissions, such as dynamic adaptive streaming over HTTP(DASH), digital video that is encoded onto a data storage medium,decoding of digital video stored on a data storage medium, or otherapplications. In some examples, system 10 may be configured to supportone-way or two-way video transmission to support applications such asvideo streaming, video playback, video broadcasting, and/or videotelephony.

In the example of FIG. 1, source device 12 includes video source 18,video encoder 20, and output interface 22. Destination device 14includes input interface 28, dynamic range adjustment (DRA) unit 19,video decoder 30, and display device 32. In accordance with thisdisclosure, DRA unit 19 of source device 12 may be configured toimplement the techniques of this disclosure, including signaling andrelated operations applied to video data in certain color spaces toenable more efficient compression of HDR and WCG video data. In someexamples, DRA unit 19 may be separate from video encoder 20. In otherexamples, DRA unit 19 may be part of video encoder 20. In otherexamples, a source device and a destination device may include othercomponents or arrangements. For example, source device 12 may receivevideo data from an external video source 18, such as an external camera.Likewise, destination device 14 may interface with an external displaydevice, rather than including an integrated display device.

The illustrated system 10 of FIG. 1 is merely one example. Techniquesfor processing and coding HDR and WCG video data may be performed by anydigital video encoding and/or video decoding device. Moreover, someexample techniques of this disclosure may also be performed by a videopreprocessor and/or video postprocessor. A video preprocessor may be anydevice configured to process video data before encoding (e.g., beforeHEVC, VVC, or other encoding). A video postprocessor may be any deviceconfigured to process video data after decoding (e.g., after HEVC, VVC,or other decoding). Source device 12 and destination device 14 aremerely examples of such coding devices in which source device 12generates coded video data for transmission to destination device 14. Insome examples, devices 12, 14 may operate in a substantially symmetricalmanner such that each of devices 12, 14 include video encoding anddecoding components, as well as a video preprocessor and a videopostprocessor (e.g., DRA unit 19 and inverse DRA unit 31, respectively).Hence, system 10 may support one-way or two-way video transmissionbetween video devices 12, 14, e.g., for video streaming, video playback,video broadcasting, or video telephony.

Video source 18 of source device 12 may include a video capture device,such as a video camera, a video archive containing previously capturedvideo, and/or a video feed interface to receive video from a videocontent provider. As a further alternative, video source 18 may generatecomputer graphics-based data as the source video, or a combination oflive video, archived video, and computer-generated video. In some cases,if video source 18 is a video camera, source device 12 and destinationdevice 14 may form so-called camera phones or video phones. As mentionedabove, however, the techniques described in this disclosure may beapplicable to video coding and video processing, in general, and may beapplied to wireless and/or wired applications. In each case, thecaptured, pre-captured, or computer-generated video may be encoded byvideo encoder 20. The encoded video information may then be output byoutput interface 22 onto a computer-readable medium 16.

Input interface 28 of destination device 14 receives information fromcomputer-readable medium 16. The information of computer-readable medium16 may include syntax information defined by video encoder 20, which isalso used by video decoder 30, that includes syntax elements thatdescribe characteristics and/or processing of blocks and other codedunits, e.g., groups of pictures (GOPs). Display device 32 displays thedecoded video data to a user, and may comprise any of a variety ofdisplay devices such as a cathode ray tube (CRT), a liquid crystaldisplay (LCD), a plasma display, an organic light emitting diode (OLED)display, or another type of display device.

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable encoder or decoder circuitry, such as one or moremicroprocessors, digital signal processors (DSPs), application specificintegrated circuits (ASICs), field programmable gate arrays (FPGAs),discrete logic, software, hardware, firmware or any combinationsthereof. When the techniques are implemented partially in software, adevice may store instructions for the software in a suitable,non-transitory computer-readable medium and execute the instructions inhardware using one or more processors to perform the techniques of thisdisclosure. Each of video encoder 20 and video decoder 30 may beincluded in one or more encoders or decoders, either of which may beintegrated as part of a combined encoder/decoder (CODEC) in a respectivedevice.

DRA unit 19 and inverse DRA unit 31 each may be implemented as any of avariety of suitable encoder circuitry, such as one or moremicroprocessors, DSPs, ASICs, FPGAs, discrete logic, software, hardware,firmware or any combinations thereof. When the techniques areimplemented partially in software, a device may store instructions forthe software in a suitable, non-transitory computer-readable medium andexecute the instructions in hardware using one or more processors toperform the techniques of this disclosure.

In some examples, video encoder 20 and video decoder 30 operateaccording to a video compression standard, such as ITU-T H.265/HEVC,VVC, or other next generation video coding standards.

In HEVC and other video coding standards, a video sequence typicallyincludes a series of pictures. Pictures may also be referred to as“frames.” A picture may include three sample arrays, denoted S_(L),S_(Cb), and S_(Cr). S_(L) is a two-dimensional array (i.e., a block) ofluma samples. S_(Cb) is a two-dimensional array of Cb chrominancesamples. S_(Cr) is a two-dimensional array of Cr chrominance samples.Chrominance samples may also be referred to herein as “chroma” samples.In other instances, a picture may be monochrome and may only include anarray of luma samples.

Video encoder 20 may generate a set of coding tree units (CTUs). Each ofthe CTUs may comprise a coding tree block of luma samples, twocorresponding coding tree blocks of chroma samples, and syntaxstructures used to code the samples of the coding tree blocks. In amonochrome picture or a picture that has three separate color planes, aCTU may comprise a single coding tree block and syntax structures usedto code the samples of the coding tree block. A coding tree block may bean N×N block of samples. A CTU may also be referred to as a “tree block”or a “largest coding unit” (LCU). The CTUs of HEVC may be broadlyanalogous to the macroblocks of other video coding standards, such asH.264/AVC. However, a CTU is not necessarily limited to a particularsize and may include one or more coding units (CUs). A slice may includean integer number of CTUs ordered consecutively in the raster scan.

This disclosure may use the term “video unit” or “video block” to referto one or more blocks of samples and syntax structures used to codesamples of the one or more blocks of samples. Example types of videounits may include CTUs, CUs, PUs, transform units (TUs) in HEVC, ormacroblocks, macroblock partitions, and so on in other video codingstandards.

To generate a coded CTU, video encoder 20 may recursively performquad-tree partitioning on the coding tree blocks of a CTU to divide thecoding tree blocks into coding blocks, hence the name “coding treeunits.” A coding block is an N×N block of samples. A CU may comprise acoding block of luma samples and two corresponding coding blocks ofchroma samples of a picture that has a luma sample array, a Cb samplearray and a Cr sample array, and syntax structures used to code thesamples of the coding blocks. In a monochrome picture or a picture thathas three separate color planes, a CU may comprise a single coding blockand syntax structures used to code the samples of the coding block.

Video encoder 20 may partition a coding block of a CU into one or moreprediction blocks. A prediction block may be a rectangular (i.e., squareor non-square) block of samples on which the same prediction is applied.A prediction unit (PU) of a CU may comprise a prediction block of lumasamples, two corresponding prediction blocks of chroma samples of apicture, and syntax structures used to predict the prediction blocksamples. In a monochrome picture or a picture that have three separatecolor planes, a PU may comprise a single prediction block and syntaxstructures used to predict the prediction block samples. Video encoder20 may generate predictive luma, Cb and Cr blocks for luma, Cb and Crprediction blocks of each PU of the CU.

In JEM7, rather than using the quadtree partitioning structure of HEVCdescribed above, a quadtree binary tree (QTBT) partitioning structuremay be used. The QTBT structure removes the concepts of multiplepartitions types. That is, the QTBT structure removes the separation ofthe CU, PU, and TU concepts, and supports more flexibility for CUpartition shapes. In the QTBT block structure, a CU can have either asquare or rectangular shape. In one example, a CU is first partition bya quadtree structure. The quadtree leaf nodes are further partitioned bya binary tree structure.

In some examples, there are two splitting types: symmetric horizontalsplitting and symmetric vertical splitting. The binary tree leaf nodesare called CUs, and that segmentation (i.e., the CU) is used forprediction and transform processing without any further partitioning.This means that the CU, PU, and TU have the same block size in the QTBTcoding block structure. In JEM, a CU sometimes consists of coding blocks(CBs) of different color components. For example, one CU contains oneluma CB and two chroma CBs in the case of P and B slices of the 4:2:0chroma format and sometimes consists of a CB of a single component. Forexample, one CU contains only one luma CB or just two chroma CBs in thecase of I slices.

In some examples, video encoder 20 and video decoder 30 may beconfigured to operate according to JEM/VVC. According to JEM/VVC, avideo coder (such as video encoder 20) partitions a picture into aplurality of CU. An example QTBT structure of JEM includes two levels: afirst level partitioned according to quadtree partitioning, and a secondlevel partitioned according to binary tree partitioning. A root node ofthe QTBT structure corresponds to a CTU. Leaf nodes of the binary treescorrespond to coding units (CUs).

In some examples, video encoder 20 and video decoder 30 may use a singleQTBT structure to represent each of the luminance and chrominancecomponents, while in other examples, video encoder 20 and video decoder30 may use two or more QTBT structures, such as one QTBT structure forthe luminance component and another QTBT structure for both chrominancecomponents (or two QTBT structures for respective chrominancecomponents).

Video encoder 20 and video decoder 30 may be configured to use quadtreepartitioning per HEVC, QTBT partitioning according to JEM/VVC, or otherpartitioning structures. For purposes of explanation, the description ofthe techniques of this disclosure is presented with respect to QTBTpartitioning. However, it should be understood that the techniques ofthis disclosure may also be applied to video coders configured to usequadtree partitioning, or other types of partitioning as well.

FIGS. 2A and 2B are conceptual diagram illustrating an example quadtreebinary tree (QTBT) structure 130, and a corresponding coding tree unit(CTU) 132. The solid lines represent quadtree splitting, and dottedlines indicate binary tree splitting. In each split (i.e., non-leaf)node of the binary tree, one flag is signaled to indicate whichsplitting type (i.e., horizontal or vertical) is used, where 0 indicateshorizontal splitting and 1 indicates vertical splitting in this example.For the quadtree splitting, there is no need to indicate the splittingtype, since quadtree nodes split a block horizontally and verticallyinto 4 sub-blocks with equal size. Accordingly, video encoder 20 mayencode, and video decoder 30 may decode, syntax elements (such assplitting information) for a region tree level of QTBT structure 130(i.e., the solid lines) and syntax elements (such as splittinginformation) for a prediction tree level of QTBT structure 130 (i.e.,the dashed lines). Video encoder 20 may encode, and video decoder 30 maydecode, video data, such as prediction and transform data, for CUsrepresented by terminal leaf nodes of QTBT structure 130.

In general, CTU 132 of FIG. 2B may be associated with parametersdefining sizes of blocks corresponding to nodes of QTBT structure 130 atthe first and second levels. These parameters may include a CTU size(representing a size of CTU 132 in samples), a minimum quadtree size(MinQTSize, representing a minimum allowed quadtree leaf node size), amaximum binary tree size (MaxBTSize, representing a maximum allowedbinary tree root node size), a maximum binary tree depth (MaxBTDepth,representing a maximum allowed binary tree depth), and a minimum binarytree size (MinBTSize, representing the minimum allowed binary tree leafnode size).

The root node of a QTBT structure corresponding to a CTU may have fourchild nodes at the first level of the QTBT structure, each of which maybe partitioned according to quadtree partitioning. That is, nodes of thefirst level are either leaf nodes (having no child nodes) or have fourchild nodes. The example of QTBT structure 130 represents such nodes asincluding the parent node and child nodes having solid lines forbranches. If a node of the first level are not larger than the maximumallowed binary tree root node size (MaxBTSize), then the node can befurther partitioned by respective binary trees. The binary treesplitting of one node can be iterated until the nodes resulting from thesplit reach the minimum allowed binary tree leaf node size (MinBTSize)or the maximum allowed binary tree depth (MaxBTDepth). The example ofQTBT structure 130 represents such nodes as having dashed lines forbranches. The binary tree leaf node is referred to as a coding unit(CU), which is used for prediction (e.g., intra-picture or inter-pictureprediction) and transform, without any further partitioning. Asdiscussed above, CUs may also be referred to as “video blocks” or“blocks.”

In one example of the QTBT partitioning structure, the CTU size is setas 128×128 (luma samples and two corresponding 64×64 chroma samples),the MinQTSize is set as 16×16, the MaxBTSize is set as 64×64, theMinBTSize (for both width and height) is set as 4, and the MaxBTDepth isset as 4. The quadtree partitioning is applied to the CTU first togenerate quad-tree leaf nodes. The quadtree leaf nodes may have a sizefrom 16×16 (i.e., the MinQTSize) to 128×128 (i.e., the CTU size). If theleaf quadtree node is 128×128, then the node will not be further splitby the binary tree, since the size exceeds the MaxBTSize (i.e., 64×64,in this example). Otherwise, the leaf quadtree node will be furtherpartitioned by the binary tree. Therefore, the quadtree leaf node isalso the root node for the binary tree and has the binary tree depth as0. When the binary tree depth reaches MaxBTDepth (4, in this example),no further splitting is permitted. The binary tree node having widthequal to MinBTSize (4, in this example) implies no further horizontalsplitting is permitted. Similarly, a binary tree node having a heightequal to MinBTSize implies no further vertical splitting is permittedfor that binary tree node. As noted above, leaf nodes of the binary treeare referred to as CUs, and are further processed according toprediction and transform without further partitioning.

Video encoder 20 may use intra prediction or inter prediction togenerate the predictive blocks for a PU. If video encoder 20 uses intraprediction to generate the predictive blocks of a PU, video encoder 20may generate the predictive blocks of the PU based on decoded samples ofthe picture associated with the PU.

If video encoder 20 uses inter prediction to generate the predictiveblocks of a PU, video encoder 20 may generate the predictive blocks ofthe PU based on decoded samples of one or more pictures other than thepicture associated with the PU. Inter prediction may be uni-directionalinter prediction (i.e., uni-prediction) or bi-directional interprediction (i.e., bi-prediction). To perform uni-prediction orbi-prediction, video encoder 20 may generate a first reference picturelist (RefPicList0) and a second reference picture list (RefPicList1) fora current slice.

Each of the reference picture lists may include one or more referencepictures. When using uni-prediction, video encoder 20 may search thereference pictures in either or both RefPicList0 and RefPicList1 todetermine a reference location within a reference picture. Furthermore,when using uni-prediction, video encoder 20 may generate, based at leastin part on samples corresponding to the reference location, thepredictive sample blocks for the PU. Moreover, when usinguni-prediction, video encoder 20 may generate a single motion vectorthat indicates a spatial displacement between a prediction block of thePU and the reference location. To indicate the spatial displacementbetween a prediction block of the PU and the reference location, amotion vector may include a horizontal component specifying a horizontaldisplacement between the prediction block of the PU and the referencelocation and may include a vertical component specifying a verticaldisplacement between the prediction block of the PU and the referencelocation.

When using bi-prediction to encode a PU, video encoder 20 may determinea first reference location in a reference picture in RefPicList0 and asecond reference location in a reference picture in RefPicList1. Videoencoder 20 may then generate, based at least in part on samplescorresponding to the first and second reference locations, thepredictive blocks for the PU. Moreover, when using bi-prediction toencode the PU, video encoder 20 may generate a first motion indicating aspatial displacement between a sample block of the PU and the firstreference location and a second motion indicating a spatial displacementbetween the prediction block of the PU and the second referencelocation.

In some examples, JEM/VVC also provides an affine motion compensationmode, which may be considered an inter-prediction mode. In affine motioncompensation mode, video encoder 20 may determine two or more motionvectors that represent non-translational motion, such as zoom in or out,rotation, perspective motion, or other irregular motion types.

After video encoder 20 generates predictive luma, Cb, and Cr blocks forone or more PUs of a CU, video encoder 20 may generate a luma residualblock for the CU. Each sample in the CU's luma residual block indicatesa difference between a luma sample in one of the CU's predictive lumablocks and a corresponding sample in the CU's original luma codingblock. In addition, video encoder 20 may generate a Cb residual blockfor the CU. Each sample in the CU's Cb residual block may indicate adifference between a Cb sample in one of the CU's predictive Cb blocksand a corresponding sample in the CU's original Cb coding block. Videoencoder 20 may also generate a Cr residual block for the CU. Each samplein the CU's Cr residual block may indicate a difference between a Crsample in one of the CU's predictive Cr blocks and a correspondingsample in the CU's original Cr coding block.

Furthermore, video encoder 20 may use quad-tree partitioning todecompose the luma, Cb and, Cr residual blocks of a CU into one or moreluma, Cb, and Cr transform blocks. A transform block may be arectangular block of samples on which the same transform is applied. Atransform unit (TU) of a CU may comprise a transform block of lumasamples, two corresponding transform blocks of chroma samples, andsyntax structures used to transform the transform block samples. In amonochrome picture or a picture that has three separate color planes, aTU may comprise a single transform block and syntax structures used totransform the transform block samples. Thus, each TU of a CU may beassociated with a luma transform block, a Cb transform block, and a Crtransform block. The luma transform block associated with the TU may bea sub-block of the CU's luma residual block. The Cb transform block maybe a sub-block of the CU's Cb residual block. The Cr transform block maybe a sub-block of the CU's Cr residual block.

Video encoder 20 may apply one or more transforms to a luma transformblock of a TU to generate a luma coefficient block for the TU. Acoefficient block may be a two-dimensional array of transformcoefficients. A transform coefficient may be a scalar quantity. Videoencoder 20 may apply one or more transforms to a Cb transform block of aTU to generate a Cb coefficient block for the TU. Video encoder 20 mayapply one or more transforms to a Cr transform block of a TU to generatea Cr coefficient block for the TU.

After generating a coefficient block (e.g., a luma coefficient block, aCb coefficient block or a Cr coefficient block), video encoder 20 mayquantize the coefficient block. Quantization generally refers to aprocess in which transform coefficients are quantized to possibly reducethe amount of data used to represent the transform coefficients,providing further compression. Furthermore, video encoder 20 may inversequantize transform coefficients and apply an inverse transform to thetransform coefficients in order to reconstruct transform blocks of TUsof CUs of a picture. Video encoder 20 may use the reconstructedtransform blocks of TUs of a CU and the predictive blocks of PUs of theCU to reconstruct coding blocks of the CU. By reconstructing the codingblocks of each CU of a picture, video encoder 20 may reconstruct thepicture. Video encoder 20 may store reconstructed pictures in a decodedpicture buffer (DPB). Video encoder 20 may use reconstructed pictures inthe DPB for inter prediction and intra prediction.

After video encoder 20 quantizes a coefficient block, video encoder 20may entropy encode syntax elements that indicate the quantized transformcoefficients. For example, video encoder 20 may perform Context-AdaptiveBinary Arithmetic Coding (CABAC) on the syntax elements indicating thequantized transform coefficients. Video encoder 20 may output theentropy-encoded syntax elements in a bitstream.

Video encoder 20 may output a bitstream that includes a sequence of bitsthat forms a representation of coded pictures and associated data. Thebitstream may comprise a sequence of network abstraction layer (NAL)units. Each of the NAL units includes a NAL unit header and encapsulatesa raw byte sequence payload (RBSP). The NAL unit header may include asyntax element that indicates a NAL unit type code. The NAL unit typecode specified by the NAL unit header of a NAL unit indicates the typeof the NAL unit. A RBSP may be a syntax structure containing an integernumber of bytes that is encapsulated within a NAL unit. In someinstances, an RBSP includes zero bits.

Different types of NAL units may encapsulate different types of RBSPs.For example, a first type of NAL unit may encapsulate a RB SP for apicture parameter set (PPS), a second type of NAL unit may encapsulate aRBSP for a coded slice, a third type of NAL unit may encapsulate a RBSPfor Supplemental Enhancement Information (SEI), and so on. A PPS is asyntax structure that may contain syntax elements that apply to zero ormore entire coded pictures. NAL units that encapsulate RBSPs for videocoding data (as opposed to RBSPs for parameter sets and SEI messages)may be referred to as video coding layer (VCL) NAL units. A NAL unitthat encapsulates a coded slice may be referred to herein as a codedslice NAL unit. A RBSP for a coded slice may include a slice header andslice data.

Video decoder 30 may receive a bitstream. In addition, video decoder 30may parse the bitstream to decode syntax elements from the bitstream.Video decoder 30 may reconstruct the pictures of the video data based atleast in part on the syntax elements decoded from the bitstream. Theprocess to reconstruct the video data may be generally reciprocal to theprocess performed by video encoder 20. For instance, video decoder 30may use motion vectors of PUs to determine predictive blocks for the PUsof a current CU. Video decoder 30 may use a motion vector or motionvectors of PUs to generate predictive blocks for the PUs.

In addition, video decoder 30 may inverse quantize coefficient blocksassociated with TUs of the current CU. Video decoder 30 may performinverse transforms on the coefficient blocks to reconstruct transformblocks associated with the TUs of the current CU. Video decoder 30 mayreconstruct the coding blocks of the current CU by adding the samples ofthe predictive sample blocks for PUs of the current CU to correspondingsamples of the transform blocks of the TUs of the current CU. Byreconstructing the coding blocks for each CU of a picture, video decoder30 may reconstruct the picture. Video decoder 30 may store decodedpictures in a decoded picture buffer for output and/or for use indecoding other pictures.

Next generation video applications are anticipated to operate with videodata representing captured scenery with HDR and/or (WCG. Parameters ofthe utilized dynamic range and color gamut are two independentattributes of video content, and their specification for purposes ofdigital television and multimedia services are defined by severalinternational standards. For example, ITU-R Rec. BT.709, “Parametervalues for the HDTV standards for production and international programmeexchange,” defines parameters for HDTV (high definition television),such as standard dynamic range (SDR) and standard color gamut, and ITU-RRec. BT.2020, “Parameter values for ultra-high definition televisionsystems for production and international programme exchange,” specifiesUHDTV (ultra-high definition television) parameters, such as HDR andWCG. There are also other standards developing organization (SDOs)documents that specify dynamic range and color gamut attributes in othersystems, e.g., DCI-P3 color gamut is defined in SMPTE-231-2 (Society ofMotion Picture and Television Engineers) and some parameters of HDR aredefined in SMPTE-2084. A brief description of dynamic range and colorgamut for video data is provided below.

Dynamic range is typically defined as the ratio between the minimum andmaximum brightness (e.g., luminance) of the video signal. Dynamic rangemay also be measured in terms of ‘f-stop,’ where one f-stop correspondsto a doubling of a signal's dynamic range. In MPEG's definition, HDRcontent is content that features brightness variation with more than 16f-stops. In some terms, levels between 10 and 16 f-stops are consideredas intermediate dynamic range but in other definitions, are consideredHDR. In some examples of this disclosure, HDR video content may be anyvideo content that has a higher dynamic range than traditionally usedvideo content with a standard dynamic range (e.g., video content asspecified by ITU-R Rec. BT.709).

The human visual system (HVS) is capable for perceiving much largerdynamic ranges than SDR content and HDR content. However, the HVSincludes an adaptation mechanism to narrow the dynamic range of the HVSto a so-called simultaneous range. The width of the simultaneous rangemay be dependent on current lighting conditions (e.g., currentbrightness). Visualization of dynamic range provided by SDR of HDTV,expected HDR of UHDTV and HVS dynamic range is shown in FIG. 3, althoughthe exact range may vary based on each individual and display.

Some example video applications and services are regulated by ITURec.709 and provide SDR, typically supporting a range of brightness(e.g., luminance) of around 0.1 to 100 candelas (cd) per m2 (oftenreferred to as “nits”), leading to less than 10 f-stops. Some examplenext generation video services are expected to provide dynamic range ofup to 16 f-stops. Although detailed specifications for such content arecurrently under development, some initial parameters have been specifiedin SMPTE-2084 and ITU-R Rec. 2020.

Another aspect for a more realistic video experience, besides HDR, isthe color dimension. Color dimension is typically defined by the colorgamut. FIG. 4 is a conceptual diagram showing an SDR color gamut(triangle 100 based on the BT.709 color primaries), and the wider colorgamut that for UHDTV (triangle 102 based on the BT.2020 colorprimaries). FIG. 3 also depicts the so-called spectrum locus (delimitedby the tongue-shaped area 104), representing the limits of the naturalcolors. As illustrated by FIG. 3, moving from BT.709 (triangle 100) toBT.2020 (triangle 102) color primaries aims to provide UHDTV serviceswith about 70% more colors. D65 specifies an example white color for theBT.709 and/or BT.2020 specifications.

Examples of color gamut specifications for the DCI-P3, BT.709, andBT.202 color spaces are shown in Table 1.

TABLE 1 Color gamut parameters RGB color space parameters Color Whitepoint Primary colors space x_(W) y_(W) x_(R) y_(R) x_(G) y_(G) x_(B)y_(B) DCI-P3 0.314 0.351 0.680 0.320 0.265 0.690 0.150 0.060 ITU- 0.31270.3290 0.64 0.33 0.30 0.60 0.15 0.06 RBT.709 ITU- 0.3127 0.3290 0.7080.292 0.170 0.797 0.131 0.046 RBT.2020

As can be seen in Table 1, a color gamut may be defined by the X and Yvalues of a white point, and by the X and Y values of the primary colors(e.g., red (R), green (G), and blue (B). The X and Y values representthe chromaticity (X) and the brightness (Y) of the colors, as is definedby the CIE 1931 color space. The CIE 1931 color space defines the linksbetween pure colors (e.g., in terms of wavelengths) and how the humaneye perceives such colors.

HDR/WCG video data is typically acquired and stored at a very highprecision per component (even floating point), with the 4:4:4 chromasub-sampling format and a very wide color space (e.g., CIE XYZ). Thisrepresentation targets high precision and is almost mathematicallylossless. However, such a format for storing HDR/WCG video data mayinclude a lot of redundancies and may not be optimal for compressionpurposes. A lower precision format with HVS-based assumptions istypically utilized for state-of-the-art video applications.

One example of a video data format conversion process for purposes ofcompression includes three major processes, as shown in FIG. 5. Thetechniques of FIG. 5 may be performed by source device 12. Linear RGBdata 110 may be HDR/WCG video data and may be stored in a floating-pointrepresentation. Linear RGB data 110 may be compacted using a non-lineartransfer function (TF) 112 for dynamic range compacting. Transferfunction 112 may compact linear RGB data 110 using any number ofnon-linear transfer functions, e.g., the PQ TF as defined in SMPTE-2084.In some examples, color conversion process 114 converts the compacteddata into a more compact or robust color space (e.g., a YUV or YCrCbcolor space) that is more suitable for compression by a hybrid videoencoder. This data is then quantized using a floating-to-integerrepresentation quantization unit 116 to produce converted HDR′ data 118.In this example HDR′ data 118 is in an integer representation. The HDR′data is now in a format more suitable for compression by a hybrid videoencoder (e.g., video encoder 20 applying HEVC techniques). The order ofthe processes depicted in FIG. 5 is given as an example, and may vary inother applications. For example, color conversion may precede the TFprocess. In some examples, additional processing, e.g. spatialsubsampling, may be applied to color components.

The inverse conversion at the decoder side is depicted in FIG. 6. Thetechniques of FIG. 6 may be performed by destination device 14.Converted HDR′ data 120 may be obtained at destination device 14 throughdecoding video data using a hybrid video decoder (e.g., video decoder 30applying HEVC techniques). HDR′ data 120 may then be inverse quantizedby inverse quantization unit 122. Then an inverse color conversionprocess 124 may be applied to the inverse quantized HDR′ data. Theinverse color conversion process 124 may be the inverse of colorconversion process 114. For example, the inverse color conversionprocess 124 may convert the HDR′ data from a YCrCb format back to an RGBformat. Next, inverse transfer function 126 may be applied to the datato add back the dynamic range that was compacted by transfer function112 to recreate the linear RGB data 128.

The techniques depicted in FIG. 5 will now be discussed in more detail.Mapping the digital values appearing in an image container to and fromoptical energy may involve the use of a “transfer function.” In general,a transfer function is applied to data (e.g., HDR/WCG video data) tocompact the dynamic range of the data. Such compaction allows the datato be represented with fewer bits. In one example, the transfer functionmay be a one-dimensional (1D) non-linear function and may reflect theinverse of an electro-optical transfer function (EOTF) of the end-userdisplay, e.g., as specified for SDR in ITU-R BT. 1886 (also defined inRec. 709). In another example, the transfer function may approximate theHVS perception to brightness changes, e.g., the PQ transfer functionspecified in SMPTE-2084 for HDR. The inverse process of the OETF is theEOTF (electro-optical transfer function), which maps the code levelsback to luminance. FIG. 7 shows several examples of non-linear transferfunction used to compact the dynamic range of certain color containers.The transfer functions may also be applied to each R, G and B componentseparately.

The reference EOTF specified in ITU-R recommendation BT.1886 is definedby the equation:

L=a(max[(V+b),0])^(γ)

where:

L: Screen luminance in cd/m₂

L_(W): Screen luminance for white

L_(B): Screen luminance for black

V: Input video signal level (normalized, black at V=0, to white at V=1.For content mastered per Recommendation ITU-R BT.709, 10-bit digitalcode values “D” map into values of V per the following equation:V=(D−64)/876

γ: Exponent of power function, γ=2.404

a: Variable for user gain (legacy “contrast” control)

a=(L _(W) ^(1/γ) −L _(B) ^(1/γ))^(γ)

b: Variable for user black level lift (legacy “brightness” control)

$b = \frac{L_{B}^{1/\gamma}}{L_{W}^{1/\gamma} - L_{B}^{1/\gamma}}$

The above variables a and b are derived by solving the followingequations in order that V=1 gives L=L_(W), and that V=0 gives L=L_(B):

L _(B) =a·b ^(γ)

L _(W) =a·(1+b)^(γ)

In order to support higher dynamic range data more efficiency, SMPTE hasrecently standardized a new transfer function called SMPTE ST-2084.Specification of ST2084 defined the EOTF application as following. A TFis applied to normalized linear R, G, B values, which results in anonlinear representation of R′G′B′. ST-2084 defines normalization byNORM=10000, which is associated with a peak brightness of 10000 nits(cd/m2).

R^(′) = PQ_TF (max (0, min (R/NORM, 1)))G^(′) = PQ_TF (max (0, min (G/NORM, 1)))B^(′) = PQ_TF (max (0, min (B/NORM, 1))) with${{PQ\_ TF}\;(L)} = {{\left( \frac{c_{1} + {c_{2}L^{m_{1}}}}{1 + {c_{3}L^{m_{1}}}} \right)^{m_{2}}m_{1}} = {{\frac{2610}{4096} \times \frac{1}{4}} = {{0.1593017578125m_{2}} = {{\frac{2523}{4096} \times 128} = {{78.84375c_{1}} = {{c_{3} - c_{2} + 1} = {\frac{3424}{4096} = {{0.8359375c_{2}} = {{\frac{2413}{4096} \times 32} = {{18.8515625c_{3}} = {{\frac{2392}{4096} \times 32} = 18.6875}}}}}}}}}}}$

Typically, an EOTF is defined as a function with a floating-pointaccuracy, thus no error is introduced to a signal with thisnon-linearity if an inverse TF (so-called OETF) is applied. An inverseTF (OETF) specified in ST-2084 is defined as inversePQ function:

R = 10000 * inversePQ_TF (R^(′)) G = 10000 * inversePQ_TF (G^(′))B = 10000 * inversePQ_TF (B^(′))${{with}\mspace{14mu}{inversePQ\_ TF}\;(N)} = \left( \frac{\max\left\lbrack {\left( {N^{1/m_{2}} - c_{1}} \right),0} \right\rbrack}{c_{2} - {c_{3}N^{1/m_{2}}}} \right)^{1/m_{1}}$$m_{1} = {{\frac{2610}{4096} \times \frac{1}{4}} = {{0.1593017578125m_{2}} = {{\frac{2523}{4096} \times 128} = {{78.84375c_{1}} = {{c_{3} - c_{2} + 1} = {\frac{3424}{4096} = {{0.8359375c_{2}} = {{\frac{2413}{4096} \times 32} = {{18.8515625c_{3}} = {{\frac{2392}{4096} \times 32} = 18.6875}}}}}}}}}}$

Note, that the EOTF and OETF is a subject of very active research andstandardization, and TF utilized in some video coding systems may bedifferent from ST-2084.

In the context of this disclosure, the terms “signal value” or “colorvalue” may be used to describe a luminance level corresponding to thevalue of a specific color component (such as R, G, B, or Y) for an imageelement. The signal value is typically representative of a linear lightlevel (luminance value). The terms “code level” or “digital code value”may refer to a digital representation of an image signal value.Typically, such a digital representation is representative of anonlinear signal value. An EOTF represents the relationship between thenonlinear signal values provided to a display device (e.g., displaydevice 32) and the linear color values produced by the display device.

RGB data is typically utilized as the input color space, since RGB isthe type of data that is typically produced by image capturing sensors.However, the RGB color space has high redundancy among its componentsand is not optimal for compact representation. To achieve more compactand a more robust representation, RGB components are typically converted(e.g., a color transform is performed) to a more uncorrelated colorspace that is more suitable for compression, e.g., YCbCr. A YCbCr colorspace separates the brightness in the form of luminance (Y) and colorinformation (CrCb) in different less correlated components. In thiscontext, a robust representation may refer to a color space featuringhigher levels of error resilience when compressed at a constrainedbitrate.

For modern video coding systems, a typically used color space is YCbCr,as specified in ITU-R BT.709. The YCbCr color space in the BT.709standard specifies the following conversion process from R′G′B′ toY′CbCr (non-constant luminance representation):

a.  Y^(′) = 0.2126 * R^(′) + 0.7152 * G^(′) + 0.0722 * B^(′)${b.\mspace{14mu}{Cb}} = \frac{B^{\prime} - Y^{\prime}}{1.8556}$${c.\mspace{14mu}{Cr}} = \frac{R^{\prime} - Y^{\prime}}{1.5748}$

The above can also be implemented using the following approximateconversion that avoids the division for the Cb and Cr components:

Y′=0.212600*R′+0.715200*G′+0.072200*B′  a.

Cb=−0.114572*R′−0.385428*G′+0.500000*B′  b.

Cr=0.500000*R′−0.454153*G′−0.045847*B′  c.

The ITU-R BT.2020 standard specifies the following conversion processfrom R′G′B′ to Y′CbCr (non-constant luminance representation):

a.  Y^(′) = 0.2627 * R^(′) + 0.6780 * G^(′) + 0.0593 * B^(′)${b.\mspace{14mu}{Cb}} = \frac{B^{\prime} - Y^{\prime}}{1.8814}$${c.\mspace{14mu}{Cr}} = \frac{R^{\prime} - Y^{\prime}}{1.4746}$

The above can also be implemented using the following approximateconversion that avoids the division for the Cb and Cr components:

Y′=0.262700*R′+0.678000*G′+0.059300*B′  a.

Cb=−0.139630*R′−0.360370*G′+0.500000*B′  b.

Cr=0.500000*R′−0.459786*G′−0.040214*B′  c.

Following the color transform, input data in a target color space may bestill represented at high bit-depth (e.g. floating-point accuracy). Thehigh bit-depth data may be converted to a target bit-depth, for example,using a quantization process. Certain studies show that 10-12 bitsaccuracy in combination with the PQ transfer is sufficient to provideHDR data of 16 f-stops with distortion below the Just-NoticeableDifference (JND). In general, a JND is the amount something (e.g., videodata) must be changed in order for a difference to be noticeable (e.g.,by the HVS). Data represented with 10 bits accuracy can be further codedwith most of the state-of-the-art video coding solutions. Thisquantization is an element of lossy coding and is a source of inaccuracyintroduced to converted data.

Example of such quantization applied to code words in target color space(in this example YCbCr) is shown below. Input values YCbCr representedin floating point accuracy are converted into a signal of fixedbit-depth BitDepthY for the Y value and BitDepthC for the chroma values(Cb, Cr).

D _(Y′)=Clip1_(Y)(Round((1<<(BitDepth_(Y)−8))*(219*Y′+16)))

D _(Cb)=Clip1_(C)(Round((1<<(BitDepth_(C)−8))*(224*Cb+128)))

D _(Cr)=Clip1_(C)(Round((1<<(BitDepth_(C)−8))*(224*Cr+128)))

with

Round(x)=Sign(x)*Floor(Abs(x)+0.5)

Sign(x)=−1 if x<0,0 if x=0,1 if x>0

Floor(x) the largest integer less than or equal to x

Abs(x)=x if x>=0,−x if x<0

Clip1_(Y)(x)=Clip3(0,(1<<BitDepth_(Y))−1,x)

Clip1_(C)(x)=Clip3(0,(1<<BitDepth_(C))−1,x)

Clip3(x,y,z)=x if z<x,y if z>y,z otherwise

A rate distortion optimized quantizer (RDOQ) will now be described. Mostof the state-of-the-art video coding solutions (e.g., HEVC and thedeveloping VVC) are based on the so-called hybrid video coding scheme,which is basically applying scalar quantization of the transformcoefficients resulting from residual signal produced in turn by applyingtemporal or spatial prediction between currently coded video signal andreference picture(s) available at the decoder side. Scalar quantizationapplied on the encoder side (e.g., video encoder 20) and inverse scalardequantization is applied on the decoder side (e.g., video decoder 30).Lossy scalar quantization introduces distortion to the reconstructedsignal and requires certain number of bits to deliver quantizedtransform coefficients as well as coding modes description to thedecoder side.

During the evolution of video compression techniques, a number ofapproaches targeting the improvement of quantized coefficientcalculation has been developed. One approach is Rate DistortionOptimized Quantization (RDOQ), which is based on rough estimation of theRD cost of modification or removal of selected transform coefficient ortransform coefficients group. The purpose of RDOQ is to find the optimalor most optimal set of quantized transform coefficients representing aresidual data in an encoded block. The RDOQ calculates the imagedistortion (introduced by quantization of transform coefficients) in anencoded block and a number of bits needed to encode the correspondingquantized transform coefficient. Based on these two values, the encoderchooses a better coefficient value by calculating RD cost.

The RDOQ in the encoder may include 3 stages: quantization of transformcoefficients, elimination of coefficient groups (CG), and selection ofthe last non-zero coefficient. In the first stage, the video encoderperforms produce transform coefficients by uniform quantizer withoutdead zone, which results in Level value calculation for the currenttransform coefficient. Following this, the video encoder considers twoadditional magnitudes of this quantized coefficient: Level-1 and 0. Forevery one of these 3 options {Level, Level-1, 0}, the video encodercalculates the RD cost of encoding the coefficient with the selectedmagnitude and chooses the one with the lowest RD cost. In addition, someRDOQ implementations may consider nullifying a transform coefficientgroup entirely, or reducing the size of the signaled transformcoefficient group by reducing the position of the last signaledcoefficient for each of the groups. At the decoder side, inverse scalarquantization is applied to quantized transform coefficients derived fromthe syntax elements of the bitstream.

Some of the existing transfer functions and color transforms used invideo coding may result in a video data representation that featuressignificant variation of Just-Noticeable Difference (JND) thresholdvalues over the dynamic range of the signal representation. That is,some ranges of codeword values for luma and/or chroma components mayhave different JND threshold values than other ranges of codeword valuesfor the luma and/or chroma components. For such representations, aquantization scheme that is uniform over the dynamic range of lumavalues (e.g., uniform over all codeword values for luma) would introducequantization error with different merit of human perception over thesignal fragments (partitions of the dynamic range). Such impact onsignals may be interpreted by a viewer as a processing system with anon-uniform quantization, which results in unequal signal-to-noiseratios within processed data range.

Examples of such representation is a video signal represented in NonConstant Luminance (NCL) YCbCr color space whose color primaries aredefined in ITU-R Rec. BT.2020 and with the ST-2084 transfer function. Asillustrated in Table 2, NCL YCbCR color spaces allocates a significantlylarger amount of codewords for the low intensity values of the signal,e.g., 30% of codewords represent linear light samples <10 nits, whereashigh intensity samples (high brightness) are represented with a muchsmaller amount of codewords, e.g., 25% of codewords are allocated forlinear light in the range 1000-10000 nits. As a result, a video codingsystem, e.g., H.265/HEVC, featuring uniform quantization for all rangesof the data would introduce much more severe coding artifacts to thehigh intensity samples (bright region of the signal), where distortionintroduced to low intensity samples (dark region of the same signal)would be far below noticeable difference.

TABLE 2 Relation between linear light intensity and code value in SMPTEST 2084 (bit depth = 10) Linear light intensity Full SDI Narrow (cd/m²)range range range ~0.01 21 25 83 ~0.1 64 67 119 ~1 153 156 195 ~10 307308 327 ~100 520 520 509 ~1,000 769 767 723 ~4,000 923 920 855 ~10,0001023 1019 940

Effectively, this means that video coding system design, or encodingalgorithms may benefit from adjustment for every selected video datarepresentation, namely for every selected transfer function and colorspace. Previously, the following methods have been proposed to addressthe problems with non-optimal perceptual quality codeword distributiondescribed above.

In “Dynamic Range Adjustment SEI to enable High Dynamic Range videocoding with Backward-Compatible Capability,” D. Rusanovskyy, A. K.Ramasubramonian, D. Bugdayci, S. Lee, J. Sole, M. Karczewicz, VCEGdocument COM16-C 1027-E, September 2015, the authors proposed to apply acodewords re-distribution to video data prior to video coding. Videodata in the ST-2084/BT.2020 representation undergoes a codewordre-distribution prior to the video compression. The re-distributionintroduces linearization of perceived distortion (signal-to-noise ratio)within a dynamic range of the data through a Dynamical Range Adjustment.This redistribution was found to improve visual quality under thebitrate constrains. To compensate the redistribution and convert data tothe original ST 2084/BT.2020 representation, an inverse process isapplied to the data after video decoding.

One of the drawbacks of this approach is the fact that thepre-processing and post-processing are generally de-coupled from ratedistortion optimization processing employed by state-of-the-art encoderson a block-based basis. Therefore, the technique described in VCEGdocument COM16-C 1027-E does not employ information available to thedecoder, such as target frame rate of quantization distortion introducedby quantization scheme of video codec.

In “Performance investigation of high dynamic range and wide color gamutvideo coding techniques,” J. Zhao, S.-H. Kim, A. Segall, K. Misra, VCEGdocument COM16-C 1030-E, September 2015, an intensity dependentspatially varying (block-based) quantization scheme was proposed toalign bitrate allocation and visually perceived distortion between videocoding applied on Y₂₀₂₀ (ST2084/BT2020) and Y₇₀₉ (BT1886/BT 2020)representations. It was observed that to maintain the same level ofquantization in luma components, the quantization of signal in Y₂₀₂₀ andY₇₀₉ differs by a value that depends on luma, such that:

QP_Y ₂₀₂₀=QP_Y ₇₀₉−ƒ(Y ₂₀₂₀)

The function ƒ (Y₂₀₂₀) was found to be linear for intensity values(brightness level) of video in Y₂₀₂₀, and it may be approximated as:

ƒ(Y ₂₀₂₀)=max(0.03*Y ₂₀₂₀−3,0)

The proposed spatially varying quantization scheme being introduced atthe encoding stage was found to be able to improve visually perceivedsignal-to-quantization noise ratio for coded video signal in ST2084/BT.2020 representation.

One of the drawback of this approach is a block-based granularity of QPadaptation. Typically utilized block sizes selected at the encoder sidefor compression are derived through a rate distortion optimizationprocess, and may not represent dynamic range properties of the videosignal, thus the selected QP settings will be sub-optimal for the signalinside of the block. This problem may become even more important for thenext generation video coding systems that tend to employ prediction andtransform block sizes of larger dimensions. Another aspect of thisdesign is a need for signaling of QP adaptation parameters to thedecoder side for inverse dequantization. Additionally, spatialadaptation of quantization parameters at the encoder side increases thecomplexity of encoding optimization and may interfere with rate controlalgorithms.

In “Intensity dependent spatial quantization with application in HEVC,”Matteo Naccari and Marta Mrak, In Proc. of IEEE ICME 2013, July 2013, anIntensity Dependent Spatial Quantization (IDSQ) perceptual mechanism wasproposed. IDSQ exploits the intensity masking of the human visual systemand perceptually adjusts quantization of the signal at the block level.The authors of this paper propose to employ in-loop pixel domainscaling. Parameters of in-loop scaling for a currently processed blockare derived from average values of luma component in the predictedblock. At the decoder side, the inverse scaling is performed, anddecoder derives parameters of scaling from predicted block available atthe decoder side.

Similarly to techniques in “Performance investigation of high dynamicrange and wide color gamut video coding techniques,” a block-basedgranularity of this approach restricts the performance of this methoddue to a sub-optimal scaling parameter, which is applied to all samplesof the processed block. Another aspect of the proposed solution is thatthe scale value is derived from predicted block and does not reflectsignal fluctuation which may happen between current codec block andpredicted.

“De-quantization and scaling for next generation containers,” J. Zhao,A. Segall, S.-H. Kim, K. Misra (Sharp), JVET document B0054, January2016, addresses a problem of non-uniform perceived distortion in theST.2084/BT.2020 representations. The authors proposed to employ in-loopintensity dependent block-based transform domain scaling. Parameters ofin-loop scaling for selected transform coefficients (AC coefficients) ofthe currently processed block are derived as a function of averagevalues of luma components in the predicted block and a DC value derivedfor the current block. At the decoder side, the inverse scaling isperformed, and the decoder derives parameters of AC coefficient scalingfrom the predicted block available at the decoder side and from aquantized DC value which is signalled to the decoder.

Similar to techniques in “Performance investigation of high dynamicrange and wide color gamut video coding techniques” and “Intensitydependent spatial quantization with application in HEVC,” a block-basedgranularity of this approach restricts the performance of this methoddue sub-optimality of scaling parameter which is applied to all samplesof the processed block. Another aspect of the proposed solution is thatthe scale value is applied to AC transform coefficients only. Therefore,signal-to-noise ratio improvement does not affect the DC value, whichreduces the performance of the scheme. In addition, in some video codingsystem designs, the quantized DC value may not be available at the timeof AC value scaling, e.g., in the case when a quantization process isfollowing a cascade of transform operations. Another restriction of thisproposal is that when the encoder selects the transform skip ortransform/quantization bypass modes for the current block, scaling isnot applied (hence, in the decoder scaling is not defined for transformskip and transform/quantization bypass modes), which is sub-optimal dueto exclusion of potential coding gain for these two modes.

In U.S. patent application Ser. No. 15/595,793, filed May 15, 2017,in-loop sample processing for video signals with non-uniformlydistributed JD was described. This Patent Application describes theapplication of scale and offset of signal samples represented either inpixel, residual or transform domain. Several algorithms for derivationof the scale and offset were proposed.

This disclosure describes several video coding and processing techniquesthat may be applied in the video coding loop (e.g., during the videoencoding and/or decoding process and not in pre- or post-processing) ofa video coding system. The techniques of this disclosure include encoderside (e.g., video encoder 20) algorithms with content-adaptive spatiallyvarying quantization without explicit signaling of quantizationparameters (e.g., a change in a quantization parameter represented by adeltaQP syntax element) to more efficiently compress HDR/WCG videosignals. The techniques of this disclosure also include decoder side(e.g., video decoder 30) operations which improve performance of videodecoding tools that use quantization parameters information. Examples ofsuch decoding tools can include deblocking filters, bilateral filters,adaptive loop filters, or other video coding tools that use quantizationinformation as an input.

Video encoder 20 and/or video decoder 30 may be configured to performone or more of the following techniques independently, or in anycombination with others.

In one example of the disclosure, video encoder 20 may be configured toperform a multi-stage quantization process for each block of video datain a picture of video data. The techniques described below may beapplied to both luma and chroma components of the video data. Videodecoder may be configured to perform quantization using a basequantization parameter (QPb) value. That is, the QPb value is applieduniformly across all blocks. For a given base quantization parameter(QPb) value provided to transform quantization to be applied to thesamples s(Cb) of the coded block Cb, video encoder 20 may further beconfigured to utilize a content-dependent QP offset as a deviation fromthe QPb value. That is, for each block of video data, or for a group ofblocks of video data, video encoder 20 may further determine a QP offsetthat is based on the content of the block of groups of blocks.

In this way, video encoder 20 may account for a rate distortionoptimized (RDO) selection of a quantization level LevelX, which isproduced by an effectively different quantization parameter (QPe). Inthis disclosure, QPe may be referred to as an effective quantizationparameter. QPe is the QP offset (deltaQP) plus the base QPb value. Videoencoder 20 may derive QPe for a current block Cb using the followingequation:

QPe(Cb)=QPb(Cb)+deltaQP(s(Cb)), with deltaQP>0)   (1)

where the deltaQP(Cb) variable is derived from local properties (e.g.,statistics) of the coded block Cb. For example, video encoder 20 may beconfigured to derive the value of deltaQP for block Cb using a functionof the average of the sample values (e.g., luma or chroma values) of theblock Cb. In other examples, video encoder 20 may use other functions ofthe sample value values of block Cb to determine the value of deltaQP.For example, video encoder 20 may determine the value of deltaQP using asecond order operation (e.g., variance) on the sample values of blockCb. As another example, video encoder 20 may determine the value ofdeltaQP using a function of the sample values of block Cb and values ofone or more samples of neighboring blocks. As will be explained in moredetail below, video encoder 20 may be configured to quantize residualvalues of block Cb with both the QPb value and the QPe value. As aresult, the residual data r(Cb) derived for currently coded block Cb iscoded with quantization parameter QPb. However, the distortionintroduced to the residual is first produced with quantization parameterQPe, resulting in transform quantization coefficients tq(Cb). Since QPemay vary from block to block, video encoder 20 may adjust for thevarying JND threshold present in some color representations, and providefor a non-uniform quantization.

At video decoder 30, quantized transform coefficients tq(Cb) undergoinverse quantization with base quantization parameter QPb. Video decoder30 may derive the base quantization parameter QPb from a syntax elementthat is associated with the current block Cb. Video decoder 30 mayreceive the syntax element in an encoded video bitstream. Video decoder30 may then perform one or more inverse transforms on the inversequantized transform coefficients to create decoded residual. Videodecoder 30 may then perform a prediction process (e.g., inter predictionor intra prediction) to produce decoded samples d(Cb) for current blockCb.

Note that video decoder 30 does not use effective quantization parameterQPe when reconstructing residual values for the block. As such, thedistortion introduced by video encoder 20 when applying QPe duringencoding remains in the residuals, thus improving uneven JND thresholdissues with certain color spaces, as discussed above. However,considering that the residual signal features the distortion introducedby quantization parameter QPe, which is larger than QPb value which iscommunicated in bitstream and associated with current Cb, other decodingtools (e.g., in-loop filtering, entropy decoding, etc.) that rely on QPparameters provided by the bitstream for attenuating its operation maybe adjusted to improve their performance. This adjustment is done byproviding the coding tools under consideration with an estimate of theactual QPe which was applied by video encoder 20 to the Cb. As will beexplained in more detail below, video decoder 30 may be configured toderive an estimate of the effective quantization parameter QPe fromstatistics of the decoded samples d(Cb) and other parameters ofbitstream. In this way, bit overhead is saved, as block-by-block valuesof QPe are not signaled in the bitstream.

The following sections provide non-limiting examples of implementationsof the techniques of this disclosure. Initially, examples of a structurefor video encoder 20 an encoder-side algorithm will be described.

FIG. 8 is a block diagram illustrating an example of video encoder 20that may implement the techniques of this disclosure. As shown in FIG.8, video encoder 20 receives a current video block of video data withina video frame to be encoded. In accordance with the techniques of thisdisclosure the video data received by video encoder 20 may HDR and/orWCG video data. In the example of FIG. 8, video encoder 20 includes modeselect unit 40, video data memory 41, DPB 64, summer 50, transformprocessing unit 52, quantization unit 54, and entropy encoding unit 56.Mode select unit 40, in turn, includes motion compensation unit 44,motion estimation unit 42, intra prediction processing unit 46, andpartition unit 48. For video block reconstruction, video encoder 20 alsoincludes inverse quantization unit 58, inverse transform processing unit60, and summer 62. A deblocking filter (not shown in FIG. 8) may also beincluded to filter block boundaries to remove blockiness artifacts fromreconstructed video. If desired, the deblocking filter would typicallyfilter the output of summer 62. Additional filters (in loop or postloop) may also be used in addition to the deblocking filter. Suchfilters are not shown for brevity, but if desired, may filter the outputof summer 50 (as an in-loop filter).

Video data memory 41 may store video data to be encoded by thecomponents of video encoder 20. The video data stored in video datamemory 41 may be obtained, for example, from video source 18. Decodedpicture buffer 64 may be a reference picture memory that storesreference video data for use in encoding video data by video encoder 20,e.g., in intra- or inter-coding modes. Video data memory 41 and decodedpicture buffer 64 may be formed by any of a variety of memory devices,such as dynamic random access memory (DRAM), including synchronous DRAM(SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or othertypes of memory devices. Video data memory 41 and decoded picture buffer64 may be provided by the same memory device or separate memory devices.In various examples, video data memory 41 may be on-chip with othercomponents of video encoder 20, or off-chip relative to thosecomponents.

During the encoding process, video encoder 20 receives a video frame orslice to be coded. The frame or slice may be divided into multiple videoblocks. Motion estimation unit 42 and motion compensation unit 44perform inter-predictive coding of the received video block relative toone or more blocks in one or more reference frames to provide temporalprediction. Intra prediction processing unit 46 may alternativelyperform intra-predictive coding of the received video block relative toone or more neighboring blocks in the same frame or slice as the blockto be coded to provide spatial prediction. Video encoder 20 may performmultiple coding passes, e.g., to select an appropriate coding mode foreach block of video data.

Moreover, partition unit 48 may partition blocks of video data intosub-blocks, based on evaluation of previous partitioning schemes inprevious coding passes. For example, partition unit 48 may initiallypartition a frame or slice into LCUs, and partition each of the LCUsinto sub-CUs based on rate-distortion analysis (e.g., rate-distortionoptimization). Mode select unit 40 may further produce a quadtree datastructure indicative of partitioning of an LCU into sub-CUs. Leaf-nodeCUs of the quadtree may include one or more PUs and one or more TUs. Inother examples, partition unit 48 may partition the input video dataaccording to a QTBT partitioning structure.

Mode select unit 40 may select one of the coding modes, intra or inter,e.g., based on error results, and provides the resulting intra- orinter-coded block to summer 50 to generate residual block data and tosummer 62 to reconstruct the encoded block for use as a reference frame.Mode select unit 40 also provides syntax elements, such as motionvectors, intra-mode indicators, partition information, and other suchsyntax information, to entropy encoding unit 56.

Motion estimation unit 42 and motion compensation unit 44 may be highlyintegrated, but are illustrated separately for conceptual purposes.Motion estimation, performed by motion estimation unit 42, is theprocess of generating motion vectors, which estimate motion for videoblocks. A motion vector, for example, may indicate the displacement of aPU of a video block within a current video frame or picture relative toa predictive block within a reference picture (or other coded unit)relative to the current block being coded within the current picture (orother coded unit). A predictive block is a block that is found toclosely match the block to be coded, in terms of pixel difference, whichmay be determined by sum of absolute difference (SAD), sum of squaredifference (SSD), or other difference metrics. In some examples, videoencoder 20 may calculate values for sub-integer pixel positions ofreference pictures stored in decoded picture buffer 64. For example,video encoder 20 may interpolate values of one-quarter pixel positions,one-eighth pixel positions, or other fractional pixel positions of thereference picture. Therefore, motion estimation unit 42 may perform amotion search relative to the full pixel positions and fractional pixelpositions and output a motion vector with fractional pixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in decoded picture buffer 64. Motionestimation unit 42 sends the calculated motion vector to entropyencoding unit 56 and motion compensation unit 44.

Motion compensation, performed by motion compensation unit 44, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation unit 42. Again, motion estimationunit 42 and motion compensation unit 44 may be functionally integrated,in some examples. Upon receiving the motion vector for the PU of thecurrent video block, motion compensation unit 44 may locate thepredictive block to which the motion vector points in one of thereference picture lists. Summer 50 forms a residual video block bysubtracting pixel values of the predictive block from the pixel valuesof the current video block being coded, forming pixel difference values,as discussed below. In general, motion estimation unit 42 performsmotion estimation relative to luma components, and motion compensationunit 44 uses motion vectors calculated based on the luma components forboth chroma components and luma components. Mode select unit 40 may alsogenerate syntax elements associated with the video blocks and the videoslice for use by video decoder 30 in decoding the video blocks of thevideo slice.

Intra prediction processing unit 46 may intra-predict a current block,as an alternative to the inter-prediction performed by motion estimationunit 42 and motion compensation unit 44, as described above. Inparticular, intra prediction processing unit 46 may determine anintra-prediction mode to use to encode a current block. In someexamples, intra prediction processing unit 46 may encode a current blockusing various intra-prediction modes, e.g., during separate encodingpasses, and intra prediction processing unit 46 (or mode select unit 40,in some examples) may select an appropriate intra-prediction mode to usefrom the tested modes.

For example, intra prediction processing unit 46 may calculaterate-distortion values using a rate-distortion analysis for the varioustested intra-prediction modes, and select the intra-prediction modehaving the best rate-distortion characteristics among the tested modes.Rate-distortion analysis generally determines an amount of distortion(or error) between an encoded block and an original, unencoded blockthat was encoded to produce the encoded block, as well as a bit rate(that is, a number of bits) used to produce the encoded block. Intraprediction processing unit 46 may calculate ratios from the distortionsand rates for the various encoded blocks to determine whichintra-prediction mode exhibits the best rate-distortion value for theblock.

After selecting an intra-prediction mode for a block, intra predictionprocessing unit 46 may provide information indicative of the selectedintra-prediction mode for the block to entropy encoding unit 56. Entropyencoding unit 56 may encode the information indicating the selectedintra-prediction mode. Video encoder 20 may include in the transmittedbitstream configuration data, which may include a plurality ofintra-prediction mode index tables and a plurality of modifiedintra-prediction mode index tables (also referred to as codeword mappingtables), definitions of encoding contexts for various blocks, andindications of a most probable intra-prediction mode, anintra-prediction mode index table, and a modified intra-prediction modeindex table to use for each of the contexts.

Video encoder 20 forms a residual video block (e.g., r1(Cb) for currentblock Cb) by subtracting the prediction data from mode select unit 40from the original video block being coded. Summer 50 represents thecomponent or components that perform this subtraction operation.Transform processing unit 52 applies a transform, such as a discretecosine transform (DCT) or a conceptually similar transform, to theresidual block, producing a video block comprising residual transformcoefficient values. Transform processing unit 52 may perform othertransforms which are conceptually similar to DCT. Wavelet transforms,integer transforms, sub-band transforms or other types of transformscould also be used. In any case, transform processing unit 52 appliesthe transform to the residual block, producing a block of residualtransform coefficients. The transform may convert the residualinformation from a pixel value domain to a transform domain, such as afrequency domain. Transform processing unit 52 may send the resultingtransform coefficients tCb to quantization unit 54.

As described above, video encoder 20 may produce residual signal r(Cb)of currently coded block Cb from samples of currently coded block s(Cb)and predicted samples p(Cb) (e.g., predicted samples frominter-prediction or intra-prediction). Video encoder 20 may perform oneor more forward transforms on residual r(Cb) which results in transformcoefficients t(Cb). Video encoder 20 may then quantize the transformcoefficients t(Cb) prior to entropy encoding. Quantization unit 54quantizes the transform coefficients to further reduce bit rate. Thequantization process may reduce the bit depth associated with some orall of the coefficients. The degree of quantization may be modified byadjusting a quantization parameter. In some examples, quantization unit54 may then perform a scan of the matrix including the quantizedtransform coefficients. Alternatively, entropy encoding unit 56 mayperform the scan.

In accordance with the techniques of this disclosure, quantization unit54 may be configured to perform a multi-stage quantization process ontransform coefficients t(Cb). FIG. 9 is a block diagram illustrating anexample quantization unit of a video encoder that may implementtechniques of this disclosure

As shown in FIG. 9, at a first stage, QPe determination unit 202 may beconfigured to derive a quantization parameter offset (deltaQP(s(Cb)) forthe current block Cb. In one example, QPe determination unit 202 may beconfigured derive deltaQP(s(Cb) from a lookup table (e.g., LUT_DQP 204).LUT_DQP 204 includes the deltaQP values and is accessed by an indexderived from the average of s(Cb) samples (e.g., luma or chroma samples)of block Cb. The equation below shows one example of deriving aquantization parameter offset:

deltaQP(s(Cb))=LUT_DQP(mean(s(Cb))  (2)

where LUT_DQP is the lookup table for deltaQP(s(Cb)) and mean(s(Cb)) isthe average of the sample values of the block Cb.

In other examples, QPe determination unit 202 may be configured toderive the value of deltaQP(s(Cb)) by a function (e.g., a second orderfunction based on variance) of some other characteristic of the samplesof the coded block, or characteristics of the bitstream. QPedetermination unit 202 may be configured to determine the deltaQP valueusing an algorithm, lookup table, or may explicitly derive the deltaQPvalue using other means. In some examples, the samples used to determinedeltaQP( ) may include both luma and chroma samples, or more generallysamples of one or more components of the coded block.

QPe determination unit 202 may then use the variable deltaQP(Cb) toderive the effective quantization parameter QPe, as shown in Equation(1) above. QPe determination unit 202 may then provide the QPe value tofirst quantization unit 206 and inverse quantization unit 208. At asecond stage, first quantization unit 206 performs a forwardquantization on transform coefficients t(Cb) using the derived QPevalue. Then, inverse quantization unit 208 inversely quantizes thequantized transform coefficients using the QPe value and inversetransform unit 210 performs an inverse transformation (e.g., the inversetransform of transform processing unit 52). This results in residualblock r2(Cb) with introduced distortions of QPE. An equation for thesecond stage of the process is shown below:

r2(Cb)=InverseTrans(InverseQuant(QPe,ForwardQuant(QPe,t(Cb))))  (3)

where InverseTrans is an inverse transformation process, InverseQuant isan inverse quantization process, and ForwardQuant is a forwardquantization process.

At a third stage, transform processing unit 212 performs one or moreforward transforms (e.g., the same as transform processing unit 52) onresidual r2(Cb). Then second quantization unit 214 performs a forwardquantization on the transformed residual using the base quantizationparameter QPb. This results in quantized transform coefficients tq(Cb),as shown in the equation below:

tq(Cb)=ForwardQuant(QPb,ForwardTrans(r2(Cb)))  (4)

where ForwardTrans is a forward transformation process.

Returning to FIG. 8, following quantization, entropy encoding unit 56entropy codes the quantized transform coefficients tq(Cb). For example,entropy encoding unit 56 may perform context adaptive variable lengthcoding (CAVLC), context adaptive binary arithmetic coding (CABAC),syntax-based context-adaptive binary arithmetic coding (SBAC),probability interval partitioning entropy (PIPE) coding or anotherentropy coding technique. In the case of context-based entropy coding,context may be based on neighboring blocks. Following the entropy codingby entropy encoding unit 56, the encoded bitstream may be transmitted toanother device (e.g., video decoder 30) or archived for latertransmission or retrieval.

Inverse quantization unit 58 and inverse transform processing unit 60apply inverse quantization and inverse transformation, respectively, toreconstruct the residual block in the pixel domain, e.g., for later useas a reference block. Motion compensation unit 44 may calculate areference block by adding the residual block to a predictive block ofone of the frames of decoded picture buffer 64. Motion compensation unit44 may also apply one or more interpolation filters to the reconstructedresidual block to calculate sub-integer pixel values for use in motionestimation. Summer 62 adds the reconstructed residual block to themotion compensated prediction block produced by motion compensation unit44 to produce a reconstructed video block for storage in decoded picturebuffer 64. The reconstructed video block may be used by motionestimation unit 42 and motion compensation unit 44 as a reference blockto inter-code a block in a subsequent video frame.

Example embodiments of decoder side processing will now be described. Atthe decoder side, certain coding tools are dependent on the quantizationparameter associated with QP value utilized for coding current block, orgroup of blocks. Some non-limiting examples may include: deblockingfilters, bilateral filters, loop filter filters, interpolation filters,entropy codec initialization, or others.

FIG. 10 is a block diagram illustrating an example of video decoder 30that may implement the techniques of this disclosure. In the example ofFIG. 10, video decoder 30 includes an entropy decoding unit 70, videodata memory 71, motion compensation unit 72, intra prediction processingunit 74, inverse quantization unit 76, inverse transform processing unit78, decoded picture buffer 82, summer 80, QPe estimation unit 84,LUT_DQP 86, and filter unit 88. Video decoder 30 may, in some examples,perform a decoding pass generally reciprocal to the encoding passdescribed with respect to video encoder 20 (FIG. 8). Motion compensationunit 72 may generate prediction data based on motion vectors receivedfrom entropy decoding unit 70, while intra prediction processing unit 74may generate prediction data based on intra-prediction mode indicatorsreceived from entropy decoding unit 70.

Video data memory 71 may store video data, such as an encoded videobitstream, to be decoded by the components of video decoder 30. Thevideo data stored in video data memory 71 may be obtained, for example,from computer-readable medium 16, e.g., from a local video source, suchas a camera, via wired or wireless network communication of video data,or by accessing physical data storage media. Video data memory 71 mayform a coded picture buffer (CPB) that stores encoded video data from anencoded video bitstream. Decoded picture buffer 82 may be a referencepicture memory that stores reference video data for use in decodingvideo data by video decoder 30, e.g., in intra- or inter-coding modes.Video data memory 71 and decoded picture buffer 82 may be formed by anyof a variety of memory devices, such as DRAM, including SDRAM, MRAM,RRAM, or other types of memory devices. Video data memory 71 and decodedpicture buffer 82 may be provided by the same memory device or separatememory devices. In various examples, video data memory 71 may be on-chipwith other components of video decoder 30, or off-chip relative to thosecomponents.

During the decoding process, video decoder 30 receives an encoded videobitstream that represents video blocks of an encoded video slice andassociated syntax elements from video encoder 20. The encoded videobitstream may have been encoded by video encoder 20 using themulti-stage quantization process described above. The encoded videobitstream may also represent video data defined by an HDR and/or WCGcolor format. Entropy decoding unit 70 of video decoder 30 entropydecodes the bitstream to generate quantized coefficients, motion vectorsor intra-prediction mode indicators, and other syntax elements. Entropydecoding unit 70 forwards the motion vectors to and other syntaxelements to motion compensation unit 72. In some examples, entropydecoding unit 70 may decode a syntax element that indicates a basequantization parameter QPb for the blocks of video data to be decoded.Video decoder 30 may receive the syntax elements at the video slicelevel and/or the video block level.

When the video slice is coded as an intra-coded (I) slice, intraprediction processing unit 74 may generate prediction data for a videoblock of the current video slice based on a signaled intra predictionmode and data from previously decoded blocks of the current frame orpicture. When the video frame is coded as an inter-coded (i.e., B or P)slice, motion compensation unit 72 produces predictive blocks for avideo block of the current video slice based on the motion vectors andother syntax elements received from entropy decoding unit 70. Thepredictive blocks may be produced from one of the reference pictureswithin one of the reference picture lists. Video decoder 30 mayconstruct the reference picture lists, List 0 and List 1, using defaultconstruction techniques based on reference pictures stored in decodedpicture buffer 82. Motion compensation unit 72 determines predictioninformation for a video block of the current video slice by parsing themotion vectors and other syntax elements, and uses the predictioninformation to produce the predictive blocks for the current video blockbeing decoded. For example, motion compensation unit 72 uses some of thereceived syntax elements to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice or P slice), constructioninformation for one or more of the reference picture lists for theslice, motion vectors for each inter-encoded video block of the slice,inter-prediction status for each inter-coded video block of the slice,and other information to decode the video blocks in the current videoslice.

Motion compensation unit 72 may also perform interpolation based oninterpolation filters. Motion compensation unit 72 may use interpolationfilters as used by video encoder 20 during encoding of the video blocksto calculate interpolated values for sub-integer pixels of referenceblocks. In this case, motion compensation unit 72 may determine theinterpolation filters used by video encoder 20 from the received syntaxelements and use the interpolation filters to produce predictive blocks.

Inverse quantization unit 76 inverse quantizes, i.e., de-quantizes, thequantized transform coefficients provided in the bitstream and decodedby entropy decoding unit 70. The inverse quantization process mayinclude use of a base quantization parameter QPb determine by videodecoder 30 for each video block in the video slice to determine a degreeof quantization and, likewise, a degree of inverse quantization thatshould be applied. Inverse transform processing unit 78 applies aninverse transform, e.g., an inverse DCT, an inverse integer transform,or a conceptually similar inverse transform process, to the transformcoefficients in order to produce residual blocks in the pixel domain.

After motion compensation unit 72 generates the predictive block for thecurrent video block based on the motion vectors and other syntaxelements, video decoder 30 forms a decoded video block by summing theresidual blocks from inverse transform processing unit 78 with thecorresponding predictive blocks generated by motion compensation unit72. Summer 80 represents the component or components that perform thissummation operation.

Filter unit 88 may be configured to apply one or more filteringoperations to the decoded video data before output and storage indecoded picture buffer 82. The decoded video blocks in a given frame orpicture are then stored in decoded picture buffer 82, which storesreference pictures used for subsequent motion compensation. Decodedpicture buffer 82 also stores decoded video for later presentation on adisplay device, such as display device 32 of FIG. 1. Example filtersthat may be applied by filter unit 88 include deblocking filter,bilateral filters, adaptive loop filters, sample adaptive offsetfilters, and others. For example, if desired, a deblocking filter may beapplied to filter the decoded blocks in order to remove blockinessartifacts. Other loop filters (either in the coding loop or after thecoding loop) may also be used to smooth pixel transitions, or otherwiseimprove the video quality. Decoded picture buffer 82 also stores decodedvideo for later presentation on a display device, such as display device32 of FIG. 1.

In some examples, the parameters of a filter applied by filter unit 88may be based on a quantization parameter. As described above, the videodata received by video decoder 30 includes distortion introduced byvideo encoder 20 using effective quantization parameter QPe, which islarger than QPb value which is communicated in bitstream and associatedwith current Cb. Filters applied by filter unit 88 may rely on QPparameters provided by the bitstream to adjustment performance.Accordingly, video decoder 30 may be configured to derive an estimate ofthe actual QPe which was applied by video encoder 20 to the Cb. In thisregard, video decoder 30 may include QPe estimation unit 84 to derivethe value of QPe.

For example, QPe estimation unit 84 may be configured to estimate aquantization parameter offset (deltaQP(s(Cb)) for the current block Cb.In one example, QPe estimation unit 84 may be configured estimatedeltaQP(s(Cb) from a lookup table (e.g., LUT_DQP 86). LUT_DQP 86includes estimates of deltaQP values and is accessed by an index derivedfrom the average of decoded s(Cb) samples (e.g., luma or chroma samples)of block Cb. The equation below shows one example of deriving aquantization parameter offset:

deltaQP(s(Cb))=LUT_DQP(mean(s(Cb))  (2)

where LUT_DQP is the lookup table for deltaQP(s(Cb)) and mean(s(Cb)) isthe average of the decoded sample values of the block Cb.

In other examples, QPe estimation unit 84 may be configured to estimatethe value of deltaQP(s(Cb)) by a function (e.g., a second order functionbased on variance) of some other characteristic of the samples of thecoded block, or characteristics of the bitstream. QPe estimation unit 84may be configured to estimate the deltaQP value using an algorithm,lookup table, or may explicitly estimate the deltaQP value using othermeans. In some examples, the samples used to determine deltaQP( ) mayinclude both luma and chroma samples, or more generally samples of oneor more components of the decoded block. QPe estimation unit 84 may thenprovide the estimated value of QPe to filter unit 88 for use by one ormore coding tools implemented by filter unit 88.

In one example, filter 88 may be configured to perform deblockingfiltering. In one non-limiting example of a deblocking implementation,the process of deblocking is given below as a change of the HEVCspecification for deblocking filtering. Introduced changes are marked indouble underlines:

8.7.2.5.3 Decision Process for Luma Block Edges

The variables QpQ and QpP are set equal to the QpY_EQ QpY_EP values ofthe coding units Cbq and Cbp which include the processing blockscontaining the sample q0,0 and p0,0, respectively. The QpY_EQ and QpY_EPis derived as follows:

QpY_EQ=QpY+deltaQP(s(Cbq))

QpY_EP=QpY+deltaQP(s(Cbp))  (5)

With deltaQP(s(Cb)) variable offset is derived Look Up Table consistingthe deltaQP values and accessed by index derived from averaging s(Cb)samples.

deltaQP(d(Cbq))=LUT_DQP(mean(d(Cbq))

deltaQP(d(Cbq))=LUT_DQP(mean(d(Cbp))  (6)

A variable qPL is derived as follows:

qPL=((QpQ+Qpp+1)>>1)

8.7.2.5.5 Filtering Process for Chroma Block Edges

The variables QpQ and QpP are set equal to the QpY_EQ QpY_EP values ofthe coding units which include the coding blocks containing the sampleq0,0 and p0,0, respectively. The QpY_EQ and QpY_EP is derived asfollows:

QpY_EQ=QpY+deltaQP(s(Cbq))

QpY_EP=QpY+deltaQP(s(Cbp))  (7)

With deltaQP(s(Cb)) variable offset is derived Look Up Table consistingthe deltaQP values and accessed by index derived from averaging s(Cb)samples.

deltaQP(d(Cbq))=LUT_DQP(mean(d(CbqY)),mean(d(CbqC)),alpha)

deltaQP(d(Cbq))=LUT_DQP(mean(d(CbpY)),mean(d(CbpC)),alpha)  (9)

where d(CbqY) and d(CbpY) are decoded Luma block samples associated withchroma samples q0,0 and p0,0, belonging to d(CbqC) and d(CbpC)respectively. The alpha is a parameter specifying specific LUT_DQPutilized for current Cb. Alpha variable can be derived from syntaxelements of coded bitstream, index of the current chroma component,spatio-temporal neighborhood, or from decoded picture samples.

If ChromaArrayType is equal to 1, the variable QpC is determined asspecified in Table 8-10 based on the index qPi derived as follows:

qPi=((QpQ+QpP+1)>>1)+cQpPicOffset(

In the example above, QpY is the same as QPb and QpY_EQ is the same QPe.

In another example, filter unit 88 may be configured to implement abilateral filter. The bilateral filter modifies a sample based on aweighted average of the samples in its neighbourhood, and the weightsare derived based on the distance of the neighboring samples from thecurrent sample and the difference in the sample values of the currentsample and the neighbouring samples.

Let x be the location of a current sample value that is filtered, basedon samples in its neighbourhood N(x). For each sample d(y) for ybelonging to N(x), let w(y,x) be the weight associated with sample atlocation y to obtain the filtered version of sample at x. The filteredversion of x, D(x) is obtained as

D(x)=E _(y∈N(x)) q(y,x)d(y)  (8)

The weights are derived as

w(y,x)=ƒ(y,x,d(y),d(x),QP(Cb))  (9)

Where f( ) is the function that calculates the weights based on thesample locations and the sample values. The QP used to code the blockcontaining the samples may also be an additional argument in thederivation of f( ). In some examples, the QP value of the blockcontaining x is used as the argument to f( ). In this example, the QPvalue used as an additional argument in f( ) is QPd(Cb), which isderived as follows:

QPe(Cb)=QP(Cb)+deltaQP(d(Cb))  (10)

Where QP(Cb) is the signalled QP value (e.g., QPb) for the coded block,and deltaQP(d(Cb)) is the QP values obtained based on characteristics ofthe decoded coded block, e.g. mean. Thus, the derived weights are asfollows:

w(y,x)=ƒ(y,x,d(y),d(x),QPe(Cb))  (11)

In some example, the weighing functions are derived separately for lumaand chroma. The QP associated with the chroma coded blocks may also havethe effect of chroma offset that are derived or signalled in thebitstream, and the deltaQP( ) derived may be the function of samples oneor more components.

In some example, the QP used as an additional argument for f( ) may beobtained by taking in to account the QPe( ) value derived for codedblock containing the sample at position x, and the QPe( ) value derivedfor the coded block containing the sample at position y. For example, avalued derived from the two QPd( ) values, e.g. average, may be chosenas the argument for f( ).

In another example of the disclosure, video decoder 30 may be configuredto use multiples LUT_DQP. In some examples, two or more LUT_DQP tablescan be available at video decoder 30. Video decoder 30 may be configuredto derive an index of a particular one of the two or more lookup tablesto be used for a particular block edge are derived. Video decoder 30 maybe configured to derive the index from syntax elements, from codinginformation from blocks in the same spatio-temporal neighborhood of thecurrent, or from statistics of decoded picture samples.

For example:

deltaQP(d(Cbq))=LUT_DQP(d(Cbq),Idx1)

deltaQP(d(Cbq))=LUT_DQP(d(Cbp),Idx2)  (12)

where Idx1 and Idx2 are index selection out of several LUT_DQP tablesavailable at video decoder 30.

In another example of the disclosure, video encoder 20 and video decoder30 may be configured to apply spatially varying quantization with finerblock granularitie. In some example, video encoder 20 may be configuredto split a currently coded block Cb into sub partitions, each of whichis processed independently according to equations 2,3 and 4 above. Oncereconstructed signal r2 is produced for each of the partitions, theyform r2(Cb) data which further is processed as shown in equation (5)above.

At video decoder 30, certain coding tools, e.g., deblocking, aremodified to reflect this partitioning, although it is not provided in CUpartitioning. For example, deblocking is called to filter these virtualblock edges, in addition to edges of TUs and PUs, which is currentlyspecified.

In some example, the information about finer granularity of the blockpartitioning can be signalled in the syntax elements of the bitstream,e.g., PPS, SPS, slice header, or provided to the decoder as a sideinformation.

In some examples, constraints (e.g., effected by a clipping process) onmaximal QP values, including deltaQP or chromaQP offset values, can beremoved or extended to support a wider deviation of the QPe parametersfrom QPb utilizing video coding architectures similar to HEVC.

The above-described techniques of this disclosure may provide thefollowing advantages over other techniques. The above-describedtechniques of this disclosure may avoid deltaQP signaling, thusinherently bringing bitrate reduction of a few percent compared to thedeltaQP-based method of supporting HDR/WCG video data.

The above-described techniques of this disclosure allow for equalscaling of all transform coefficients of t(Cb), in contrast to thetechniques in “De-quantization and scaling for next generationcontainers,” J. Zhao, A. Segall, S.-H. Kim, K. Misra (Sharp), JVETdocument B0054, January 2016.

The above-described techniques of this disclosure may provide higheraccuracy estimates of local brightness compared to the techniques inU.S. patent application Ser. No. 15/595,793, since decoded valuesprovide for a better estimate than predicted samples.

The above-described techniques of this disclosure may allow a finergranularity of deltaQP derivation and application without an increase insignaling overhead associated with deltaQP-based solution.

The above-described techniques of this disclosure have a simplerimplementation design, compared to transform scaling based designs of“De-quantization and scaling for next generation containers,” and U.S.patent application Ser. No. 15/595,793.

FIG. 11 is a flowchart illustrating an example encoding method. Videoencoder 20, including quantization unit 54, may be configured to performthe techniques of FIG. 11.

In one example of the disclosure, video encoder 20 may be configured todetermine a base quantization parameter for the block of the video data(1100), and determine a quantization parameter offset for the block ofthe video data based on statistics associated with the block of thevideo data (1102). Video encoder 20 may be further configured to add thequantization parameter offset to the base quantization parameter tocreate an effective quantization parameter (1104), and encode the blockof the video data using the effective quantization parameter and thebase quantization parameter (1106). In one example, the basequantization parameter is the same for all of the blocks of the videodata. In one example, the sample values of the video data are defined bya high dynamic range video data color format.

In a further example of the disclosure, to encode the block of the videodata, video encoder 20 may be further configured to predict the block ofthe to produce residual samples, transform the residual samples tocreate transform coefficients, quantize the transform coefficients withthe effective quantization parameter, inverse quantize the quantizedtransform coefficients with the effective quantization parameter toproduce distorted transform coefficients, inverse transform thedistorted transform coefficients to produce distorted residual samples,transform the distorted residual samples, and quantize the transformeddistorted residual samples using the base quantization parameter.

In another example of the disclosure, to determine the quantizationparameter offset, video encoder 20 may be further configured todetermine the quantization parameter offset from a lookup table.

FIG. 12 is a flowchart illustrating an example decoding method. Videodecoder 30, including inverse quantization unit 76, QPe estimation unit84, and filter unit 88, may be configured to perform the techniques ofFIG. 12.

In one example of the disclosure, video decoder 30 may be configured toreceive an encoded block of the video data, the encoded block of thevideo data having been encoded using an effective quantization parameterand a base quantization parameter, wherein the effective quantizationparameter is a function of a quantization parameter offset added to thebase quantization parameter (1200). Video decoder 30 may be furtherconfigured to determine the base quantization parameter used to encodethe encoded block of the video data (1202), and decode the encoded blockof the video data using the base quantization parameter to create adecoded block of video data (1204). Video decoder 30 may be furtherconfigured to determine an estimate of the quantization parameter offsetfor the decoded block of the video data based on statistics associatedwith the decoded block of the video data (1206), and add the estimate ofthe quantization parameter offset to the base quantization parameter tocreate an estimate of the effective quantization parameter (1208). Videodecoder 30 may be further configured to perform one or more filteringoperations on the decoded block of video data as a function of theestimate of the effective quantization parameter (1210). In one example,the base quantization parameter is the same for all of the blocks of thevideo data. In another example, sample values of the video data aredefined by a high dynamic range video data color format.

In another example of the disclosure, to determine the base quantizationparameter, video decoder 30 may be further configured to receive a basequantization parameter syntax element in an encoded video bitstream, avalue of the base quantization paymaster syntax element indicating thebase quantization parameter.

In another example of the disclosure, to decode the block of the videodata, video decoder 30 may be further configured to entropy decode theencoded block of the video data to determine quantized transformcoefficients, inverse quantize the quantized transform coefficient usingthe base quantization parameter to create transform coefficients,inverse transform the transform coefficients to create residual values,and perform a prediction process on the residual values to create thedecoded block of the video data.

In another example of the disclosure, to determine the estimate of thequantization parameter offset for the decoded block of the video data,video decoder 30 may be further configured to determine an average ofsample values of the decoded block of the video data, and determine theestimate of the quantization parameter offset for the decoded block ofthe video data using the average of the sample values of the decodedblock of the video data.

In another example of the disclosure, to determine the estimate of thequantization parameter offset, video decoder 30 may be furtherconfigured to determine the estimate of the quantization parameteroffset from a lookup table, wherein the average of the sample values isan input to the lookup table.

In another example of the disclosure, video decoder 30 may be furtherconfigured to determine the lookup table from a plurality of lookuptables.

In another example of the disclosure, to perform the one or morefiltering operations on the decoded block of the video data, videodecoder 30 may be further configured to apply a deblocking filter to thedecoded block of video data using the effective quantization parameter.

In another example of the disclosure, to perform the one or morefiltering operations on the decoded block of the video data, videodecoder 30 may be further configured to apply a bilateral filter to thedecoded block of video data using the effective quantization parameter.

Certain aspects of this disclosure have been described with respect toHEVC, extensions of the HEVC standard, and examples of JEM and VVC forpurposes of illustration. However, the techniques described in thisdisclosure may be useful for other video coding processes, includingother standard or proprietary video coding processes not yet developed.

A video coder, as described in this disclosure, may refer to a videoencoder or a video decoder. Similarly, a video coding unit may refer toa video encoder or a video decoder. Likewise, video coding may refer tovideo encoding or video decoding, as applicable.

It is to be recognized that depending on the example, certain acts orevents of any of the techniques described herein can be performed in adifferent sequence, may be added, merged, or left out altogether (e.g.,not all described acts or events are necessary for the practice of thetechniques). Moreover, in certain examples, acts or events may beperformed concurrently, e.g., through multi-threaded processing,interrupt processing, or multiple processors, rather than sequentially.

In one or more examples, the functions described may be implemented inhardware, software, firmware, or any combination thereof. If implementedin software, the functions may be stored on or transmitted over as oneor more instructions or code on a computer-readable medium and executedby a hardware-based processing unit. Computer-readable media may includecomputer-readable storage media, which corresponds to a tangible mediumsuch as data storage media, or communication media including any mediumthat facilitates transfer of a computer program from one place toanother, e.g., according to a communication protocol. In this manner,computer-readable media generally may correspond to (1) tangiblecomputer-readable storage media which is non-transitory or (2) acommunication medium such as a signal or carrier wave. Data storagemedia may be any available media that can be accessed by one or morecomputers or one or more processors to retrieve instructions, codeand/or data structures for implementation of the techniques described inthis disclosure. A computer program product may include acomputer-readable medium.

By way of example, and not limitation, such computer-readable storagemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage, or other magnetic storage devices, flashmemory, or any other medium that can be used to store desired programcode in the form of instructions or data structures and that can beaccessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if instructions are transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, digital subscriber line (DSL), orwireless technologies such as infrared, radio, and microwave, then thecoaxial cable, fiber optic cable, twisted pair, DSL, or wirelesstechnologies such as infrared, radio, and microwave are included in thedefinition of medium. It should be understood, however, thatcomputer-readable storage media and data storage media do not includeconnections, carrier waves, signals, or other transitory media, but areinstead directed to non-transitory, tangible storage media. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and Blu-ray disc, wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one ormore DSPs, general purpose microprocessors, ASICs, FPGAs, or otherequivalent integrated or discrete logic circuitry. Accordingly, the term“processor,” as used herein may refer to any of the foregoing structureor any other structure suitable for implementation of the techniquesdescribed herein. In addition, in some examples, the functionalitydescribed herein may be provided within dedicated hardware and/orsoftware modules configured for encoding and decoding, or incorporatedin a combined codec. Also, the techniques could be fully implemented inone or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless handset, an integratedcircuit (IC) or a set of ICs (e.g., a chip set). Various components,modules, or units are described in this disclosure to emphasizefunctional aspects of devices configured to perform the disclosedtechniques, but do not necessarily require realization by differenthardware units. Rather, as described above, various units may becombined in a codec hardware unit or provided by a collection ofinteroperative hardware units, including one or more processors asdescribed above, in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples arewithin the scope of the following claims.

1. A method of decoding video data, the method comprising: receiving anencoded block of the video data; determining a base quantizationparameter used to encode the encoded block of the video data; decodingthe encoded block of the video data using the base quantizationparameter to create a decoded block of video data; determining anestimate of a quantization parameter offset for the decoded block of thevideo data based on content of the decoded block of the video data;adding the estimate of the quantization parameter offset to the basequantization parameter to create an estimate of an effectivequantization parameter; and performing one or more filtering operationson the decoded block of video data as a function of the estimate of theeffective quantization parameter.
 2. The method of claim 1, wherein thebase quantization parameter is the same for all of the blocks of thevideo data.
 3. The method of claim 1, wherein sample values of the videodata are defined by a high dynamic range video data color format.
 4. Themethod of claim 1, wherein determining the base quantization parametercomprises: receiving a base quantization parameter syntax element in anencoded video bitstream, a value of the base quantization paymastersyntax element indicating the base quantization parameter.
 5. The methodof claim 1, wherein decoding the encoded block of the video datacomprises: entropy decoding the encoded block of the video data todetermine quantized transform coefficients; inverse quantizing thequantized transform coefficient using the base quantization parameter tocreate transform coefficients; inverse transforming the transformcoefficients to create residual values; and performing a predictionprocess on the residual values to create the decoded block of the videodata.
 6. The method of claim 1, wherein determining the estimate of thequantization parameter offset for the decoded block of the video datacomprises: determining an average of sample values of the decoded blockof the video data; and determining the estimate of the quantizationparameter offset for the decoded block of the video data using theaverage of the sample values of the decoded block of the video data. 7.The method of claim 6, wherein determining the estimate of thequantization parameter offset comprises: determining the estimate of thequantization parameter offset from a lookup table, wherein the averageof the sample values is an input to the lookup table.
 8. The method ofclaim 7, further comprising: determining the lookup table from aplurality of lookup tables.
 9. The method of claim 1, wherein performingthe one or more filtering operations on the decoded block of the videodata comprises: applying a deblocking filter to the decoded block ofvideo data using the effective quantization parameter.
 10. The method ofclaim 1, wherein performing the one or more filtering operations on thedecoded block of the video data comprises: applying a bilateral filterto the decoded block of video data using the effective quantizationparameter. 11-15. (canceled)
 16. An apparatus configured to decode videodata, the apparatus comprising: a memory configured to store an encodedblock of the video data; and one or more processors in communicationwith the memory, the one or more processors configured to: receive theencoded block of the video data; determine a base quantization parameterused to encode the encoded block of the video data; decode the encodedblock of the video data using the base quantization parameter to createa decoded block of video data; determine an estimate of a quantizationparameter offset for the decoded block of the video data based oncontent of the decoded block of the video data; add the estimate of thequantization parameter offset to the base quantization parameter tocreate an estimate of an effective quantization parameter; and performone or more filtering operations on the decoded block of video data as afunction of the estimate of the effective quantization parameter. 17.The apparatus of claim 16, wherein the base quantization parameter isthe same for all of the blocks of the video data.
 18. The apparatus ofclaim 16, wherein sample values of the video data are defined by a highdynamic range video data color format.
 19. The apparatus of claim 16,wherein to determine the base quantization parameter, the one or moreprocessors are further configured to: receive a base quantizationparameter syntax element in an encoded video bitstream, a value of thebase quantization paymaster syntax element indicating the basequantization parameter.
 20. The apparatus of claim 16, wherein to decodethe block of the video data, the one or more processors are furtherconfigured to: entropy decode the encoded block of the video data todetermine quantized transform coefficients; inverse quantize thequantized transform coefficient using the base quantization parameter tocreate transform coefficients; inverse transform the transformcoefficients to create residual values; and perform a prediction processon the residual values to create the decoded block of the video data.21. The apparatus of claim 16, wherein to determine the estimate of thequantization parameter offset for the decoded block of the video data,the one or more processors are further configured to: determine anaverage of sample values of the decoded block of the video data; anddetermine the estimate of the quantization parameter offset for thedecoded block of the video data using the average of the sample valuesof the decoded block of the video data.
 22. The apparatus of claim 21,wherein to determine the estimate of the quantization parameter offset,the one or more processors are further configured to: determine theestimate of the quantization parameter offset from a lookup table,wherein the average of the sample values is an input to the lookuptable.
 23. The apparatus of claim 21, wherein the one or more processorsare further configured to: determine the lookup table from a pluralityof lookup tables.
 24. The apparatus of claim 16, wherein to perform theone or more filtering operations on the decoded block of the video data,the one or more processors are further configured to: apply a deblockingfilter to the decoded block of video data using the effectivequantization parameter.
 25. The apparatus of claim 16, wherein toperform the one or more filtering operations on the decoded block of thevideo data, the one or more processors are further configured to: applya bilateral filter to the decoded block of video data using theeffective quantization parameter. 26-30. (canceled)
 31. An apparatusconfigured to decode video data, the apparatus comprising: means forreceiving an encoded block of the video data; means for determining abase quantization parameter used to encode the encoded block of thevideo data; means for decoding the encoded block of the video data usingthe base quantization parameter to create a decoded block of video data;means for determining an estimate of a quantization parameter offset forthe decoded block of the video data based on content of the decodedblock of the video data; means for adding the estimate of thequantization parameter offset to the base quantization parameter tocreate an estimate of an effective quantization parameter; and means forperforming one or more filtering operations on the decoded block ofvideo data as a function of the estimate of the effective quantizationparameter.
 32. (canceled)
 33. A non-transitory computer-readable storagemedium storing instructions that, when executed, cause one or moreprocessors to: receive an encoded block of video data; determine a basequantization parameter used to encode the encoded block of the videodata; decode the encoded block of the video data using the basequantization parameter to create a decoded block of video data;determine an estimate of a quantization parameter offset for the decodedblock of the video data based on content of the decoded block of thevideo data; add the estimate of the quantization parameter offset to thebase quantization parameter to create an estimate of an effectivequantization parameter; and perform one or more filtering operations onthe decoded block of video data as a function of the estimate of theeffective quantization parameter.
 34. (canceled)
 35. The method of claim1, further comprising: determining the estimate of the quantizationparameter offset for the decoded block of the video data based on samplevalues of the decoded block of the video data.
 36. The method of claim35, further comprising: determining the estimate of the quantizationparameter offset using a lookup table based on the sample values of thedecoded block of the video data.
 37. The apparatus of claim 16, whereinthe means for determining the estimate of the quantization parameteroffset for the decoded block of the video data based on content of thedecoded block of the video data comprises means for determining theestimate of the quantization parameter offset for the decoded block ofthe video data based on sample values of the decoded block of the videodata.
 38. The apparatus of claim 37, wherein the means for determiningthe estimate of the quantization parameter offset for the decoded blockof the video data based on content of the decoded block of the videodata includes a lookup table stored in the memory.
 39. The apparatus ofclaim 31, wherein the one or more processors are configured to determinethe estimate of the quantization parameter offset for the decoded blockof the video data based on sample values of the decoded block of thevideo data.
 40. The apparatus of claim 39, wherein the one or moreprocessors are configured to determine the estimate of the quantizationparameter offset using a lookup table stored in the memory, based on thesample values of the decoded block of the video data.
 41. Thenon-transitory computer-readable storage medium of claim 33, wherein theone or more processors are configured to determine the estimate of thequantization parameter offset for the decoded block of the video databased on sample values of the decoded block of the video data.
 42. Thenon-transitory computer-readable storage medium of claim 41, wherein theone or more processors are configured to determine the estimate of thequantization parameter offset using a lookup table based on the samplevalues of the decoded block of the video data.